Gene encoding acetyl-coenzyme A carboxylase

ABSTRACT

A DNA encoding an acetyl-coenzyme A carboxylase (ACCase) from a photosynthetic organism and functional derivatives thereof which are resistant to inhibition from certain herbicides. This gene can be placed in organisms to increase their fatty acid content or to render them resistant to certain herbicides.

The United States Government has rights on this invention pursuant toContract No. DE-AC02-83CH10093 between the United States Department ofEnergy and the Midwest Research Institute.

This is a continuation of application Ser. No. 08/120,938 filed Sep. 14,1993, now abandoned.

FIELD OF THE INVENTION Background Of The Invention

The invention relates to a cloned gene which encodes an enzyme, its usesand products resulting from its use.

RELATED WORK TO THE INVENTION

Lipids, particularly triglycerides, have a great deal of commercialvalue in food and industrial products. Sunflower, safflower, rape,olive, soybean, peanut, flax, castor, oil palm, coconut and cotton areexamples of major crops which are grown primarily or secondarily fortheir lipids. All agricultural animals provide animal sources forcommercial fats and oils.

Recently, agriculturally produced triglycerides have even been proposedfor use as a diesel fuel. These products are biodegradable and are lesspolluting than their fossil fuel counterparts. Their primary drawback iscost. Consequently, there has been considerable research to improve theyields of lipids from agricultural sources.

In an attempt to enhance production of oils in plants, the acyl carrierprotein gene has been cloned so that the gene may be overproduced inhopes of increasing production. See U.S. Pat. No. 5,110,728. While acylcarrier protein is involved in the biosynthesis of lipids, it is notbelieved to be the rate limiting component. Thus it is not clear whetherorganisms containing such a cloned gene would increase production oflipids as the result of having multiple gene copies.

In the biosynthesis of fatty acids in bacteria, animals, yeast, andplants, the first step is catalyzed by the enzyme Acetyl-Coenzyme Acarboxylase, hereafter ACCase. This enzyme catalyzes the carboxylationof acetyl-CoA to form malonyl-CoA. The reaction involves two partialreactions: 1) carboxylation of an enzyme bound biotin molecule to form acarboxybiotin-enzyme complex and 2) transfer of the carboxyl group toacetyl-CoA. ACCase catalyzes the primary regulatory or rate-limitingstep in the biosynthesis of fatty acids.

In bacteria such as Escherichia coli, the ACCase has four distinct,separable protein subunit components; a biotin carboxyl carrier protein,a biotin carboxylase and two subunits of carboxyltransferase. Ineukaryotes, ACCase is composed of multimers of a single multifunctionalpolypeptide having a molecular mass typically greater than 200 kDa(Samols et al, J. Biol. Chem. 263: 6461-6464 (1988)). These multimershave molecular masses ranging from 400 kDa to 8 MDa.

Some confusion exists as to the size of ACCase from plants. Large (>200KDa) subunits have been reported for several plants. See, e.g.,Roessler, Plant Physiology 92: 73-78 (1990); Egli et al, Plant Physiol.101: 499-506 (1993); Livne et al, Plant Cell Physiol. 31: 851-858(1990); Charles et al, Phytochemistry 25: 1067-1071 (1986); Slabas etal, Plant Science 39: 177-182 (1985); Egin-Buhler et al, Eur. J.Biochem. 133: 335-339 (1983). Wurtele et al (Arch. Blochem. Biophys.278: 179-186 (1990)) suggest that plants may also have an ACCase made upof much smaller subunits.

In animals, ACCase has been shown to catalyze the rate limiting step infatty acid biosynthesis. See, e.g. Kim et al, FASEB J. 3: 2250-2256(1989) and Lane et al, Current Topics in Cellular Recognition, Horeckeret al, ed. (Academic Press, N.Y.) 8: 139-195 (1974). Regulation of thelevel of gene expression has been shown to be an important determinantof fatty acid biosynthetic rates in animals (Katsurada et al, Eur. J.Blochem. 190: 435-441 (1990); Pape et al, Arch. Biochem. Biophys. 267:104-109 (1988)). This same enzyme has recently been proposed todetermine the rates of fatty acid synthesis in plants as well(Post-Beittenmiller et al, J. Biol. Chem. 266: 1858-1865 (1991) andPost-Beittenmiller et al, Plant Physiol. 100: 923-930 (1992)). However,nothing is known about the regulation of plant ACCase gene expression.

In addition to the enzyme being well characterized in many species, thegene coding for ACCase and its subunits have been cloned from rat,chicken, yeast and E. coli. See Lopez-Casillas et al., Proc. Natl. Acad.Sci. U.S.A. 85: 5784-5788 (1988); Takai et al., J. Biol. Chem. 263:2651-2657 (1988); Al-Feel et al, Proc. Natl. Acad. Sci. U.S.A. 89:4534-4538 (1992); Li et al., J. Biol. Chem. 267: 855-863 (1992); Li etal., J. Biol. Chem. 267: 16841-16847 (1992); Kondo et al, Proc. Natl.Acad. Sci. U.S.A. 88: 9730-9733 (1991) and Alix, DNA 8: 779-789 (1989).However, as mentioned above, considerable variability in the structuresof the encoded enzymes has been noticed.

ACCase has been purified from several species of plants and algae. See,e.g. Roessler, Plant Physiology 92: 73-78 (1990); Egli et al, PlantPhysiol. 101: 499-506 (1993); Livne et al, Plant Cell Physiol. 31:851-858 (1990); Charles et al, Phytochemistry 25: 1067-1071 (1986);Slabas et al, Plant Science 39: 177-182 (1985); Nikolau et al, Arch.Biochem. Biophys. 228: 86-96 (1984); Egin-Buhler et al, Eur. J. Biochem.133: 335-339 (1983) and Finlayson et al, Arch. Biochem. Biophys. 225:576-585 (1983). The genes encoding ACCase from these and otherphotosynthetic organisms have not been cloned. Nikolau et al, EP 469,810has reported cloning a 50 kDa "subunit" from carrots. However, this isclearly not large enough to be a full length copy of the gene.

Cyclotella cryptica is a diatom which is photosynthetic and canpotentially produce up to half of its mass as lipids (Weissman et al,Biotech. Bioeng. 31: 336-344 (1988)). C. cryptica is capable of cultureoutdoors in saline groundwater which is unsuitable for normalagricultural crops. Calculations have indicated that theoretically, C.cryptica could produce more lipids than are currently produced byagricultural oilseeds. As such, C. cryptica has been considered as apotential organism for producing lipids.

Previous research has suggested that increased levels of ACCase geneexpression may be responsible for enhanced ACCase activity innutrient-deficient, lipid-accumulating C. cryptica cells (Roessler,Arch. Blochem. Biophys. 267: 521-528 (1988)). However, before thepresent invention, this hypothesis could not be tested. Furthermore,other than changing the culturing medium, no other mechanism forregulating expression existed.

In order for this natural alga to accumulate large amounts of lipids,nutrient-limiting conditions have been used. See Roessler, Arch.Blochem. Biophys. 267: 521-528 (1988) and Werner, Arch. Mikrobiol. 55:278-308 (1966). The limiting nutrient was silicon or nitrogen. Theactivity of the ACCase doubled after 4 hours of silicon deficiencyincreased four-fold after 15 hours. The exact mechanism by whichnutrients control ACCase activity is unknown.

SUMMARY OF THE INVENTION

An object of this invention is to produce large quantities of lipids,particularly triglycerides, at lower cost.

Another object of the present invention is to develop plants and otherorganisms which overproduce lipids in order to produce lipids at lowercost.

Still another object of this invention is to generate plants which areherbicide resistant so that weeding of a field can be performedefficiently.

Yet another object of the present invention is to prepare a selectablemarker for use in plant breeding.

To accomplish these goals, the gene for ACCase from C. cryptica has beencloned. The gene may be expressed in C. cryptica to increase the copynumber of the ACCase gene or to place the gene under differentregulatory control. Alternatively, the ACCase gene may be expressed inother organisms such as bacteria, yeast, plants and algae, so that thelipid compositions of the organisms are altered.

The ACCase produced by the cloned gene is resistant to the effects ofcertain herbicides. Thus, the gene can serve as a marker by impartingherbicide resistance on a recipient cell which is normally herbicidesensitive. This has certain advantages in plant breeding and in weedinga field of plants.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1A-1B are a homology plot comparing the deduced amino acidsequence of C. cryptica ACCase with the sequences of rat and yeastACCases. The areas marked are where seven or more amino acids out of tenare identical in the two sequences being compared.

FIGS. 2A-2C shows a comparison of the amino acid sequences of ACCasefrom four different species. The portion of ACCase that binds tocarboxybiotin is believed to correspond to A. The acetyl-CoA bindingregion is believed to correspond to B. The ATP binding region isbelieved to correspond to C. The amino acid sequences are provided incomputer readable form as SEQ ID NO:1 to SEQ ID NO:12.

DESCRIPTION OF THE PREFERRED EMBODIMENT

The gene for ACCase encodes a 2089 amino acid protein having a molecularmass of 230 kDa. The gene also contains a 447-base pair intron near theputative translation initiation codon and a 73-base pair intron slightlyupstream from the region of the gene that encodes the biotin bindingsite of the enzyme. A signal sequence is present in the enzyme whichresembles that capable of transporting proteins into a chloroplast orother plastid via the endoplasmic reticulum.

The ACCase gene was cloned using standard recombinant DNA techniques.Variations on these techniques are well known and may be used toreproduce the invention. Techniques for transforming host cells,expressing the gene and altering the host organism are also known andare used in accordance with the present invention.

Standard reference works setting forth the general principles ofrecombinant DNA technology and cell biology include Watson, J. D., etal., Molecular Biology of the Gene, Volumes I and II, Benjamin/CummingsPublishing Co., Inc., Menlo Park, Calif. (1987); Darnell, J. E. et al.,Molecular Cell Biology, Scientific American Books, Inc., New York, N.Y.(1986); Lewin, B. M., Genes II, John Wiley & Sons, New York, N.Y.(1985); Old, R. W. et al., Principles of Gene Manipulation: AnIntroduction to Genetic Engineering, 2nd Ed., University of CaliforniaPress, Berkeley, Calif. (1981); Maniatis, T., et al. (Molecular Cloning:A Laboratory Manual, Cold Spring Harbor Press, Cold Spring Harbor, N.Y.(1982)); Sambrook, J. et al. (Molecular Cloning: A Laboratory Manual,2nd Edition, Cold Spring Harbor Press, Cold Spring Harbor, N.Y. (1989))and Albers, B. et al., Molecular Biology of the Cell, 2nd Ed., GarlandPublishing, Inc., New York, N.Y. (1989). These references and all otherreferences mentioned in this application are herein incorporated byreference.

By "cloning" is meant the use of in vitro recombination techniques toinsert a particular gene or other DNA sequence into a vector molecule.In order to successfully clone a desired gene, it is necessary to employmethods for generating DNA fragments, for joining the fragments tovector molecules, for introducing the composite DNA molecule into a hostcell in which it can replicate, and for selecting the clone having thetarget gene from amongst the recipient host cells.

By "cDNA" is meant complementary or copy DNA produced from an RNAtemplate by the action of RNA-dependent DNA polymerase (reversetranscriptase). Thus a "cDNA clone" means a duplex DNA sequencecomplementary to an RNA molecule of interest, which may be carried in acloning vector.

By "vector" is meant a DNA molecule, derived from a plasmid,bacteriophage or hybrid, into which fragments of DNA may be inserted orcloned. A vector will contain one or more unique restriction sites, andmay be capable of autonomous replication in a defined host or vehicleorganism such that the cloned sequence is reproducible. Thus, by"expression vector" is meant any autonomous element capable ofreplicating in a host cell independently of the host's chromosome, aftera "replicon" has been incorporated into the autonomous element's genome.Such DNA expression vectors include bacterial plasmids and phages andtypically include promoter sequences to facilitate gene transcription.

A "replicon" is a sequence of DNA, gene or genes, that when ligated toother DNA causes the entire DNA to be replicated in a cell. The repliconmay be on a plasmid, virus, cosmid or chromosome which can replicate ina host cell. The DNA can have any positive number of replicons. DNAcontaining one or more replicons may occur any positive number of timesin a cell.

For the purposes of this application, the term "ACCase gene from C.cryptica" includes all nucleotide sequences possible which encode thesame amino acid sequence. By "functional derivative" is meant the"fragments," "variants," "analogs," or "chemical derivatives" of amolecule. A "fragment" of a molecule, such as any of the DNA fragmentsof the present invention or a cDNA of the ACCase gene, is meant to referto any nucleotide subset of the molecule. A "variant" of such moleculeis meant to refer to a naturally occurring molecule substantiallysimilar in structure and function to either the entire molecule or afragment thereof. An "analog" of a molecule is meant to refer to anon-natural molecule substantially similar to either the entire moleculeor a fragment thereof.

A "promoter" contains a promoter (which directs the initiation of RNAtranscription) as well as the DNA sequences which, when transcribed intoRNA, will signal the initiation of protein synthesis. "Regulatoryregions" contain both the promoter and other elements which control theactivity of the promoter. Such regions will normally include those5'-non-coding sequences involved with initiation of transcription andtranslation, such as the TATA box, capping sequence, CAAT sequence, andthe like. They may also include enhancer, inducer or repressor sequencesand binding sites, etc.

A DNA is said to be "capable of expressing" a polypeptide if it containsnucleotide sequences which contain signals for transcriptional andtranslational initiation, and such sequences are "operably linked" tonucleotide sequences which encode the polypeptide. An operable linkageis a linkage in which the signals for transcriptional and translationalinitiation and the DNA sequence sought to be expressed are connected insuch a way as to permit gene expression. The precise nature of thesignals required for gene expression may vary from organism to organism.

The "polymerase chain reaction" or "PCR" is an in vitro enzymatic methodcapable of specifically increasing the concentration of a desirednucleic acid molecule (Mullis et al., Cold Spring Harbor Symp. Quant.Biol. 51: 263-273 (1986); Erlich et al., EP 50,424, EP 84,796, EP258,017 and EP 237,362; Mullis, EP 201,184; Mullis et al., U.S. Pat. No.4,683,202; Erlich U.S. Pat. No. 4,582,788; and Saiki et al., U.S. Pat.No. 4,683,194). PCR provides a method for selectively increasing theconcentration of a particular sequence even when that sequence has notbeen previously purified and is present only in a single copy in asample. The method can be used to amplify either single- ordouble-stranded DNA. The method involves use of two oligonucleotides toserve as primers for the template-dependent, polymerase-mediatedreplication of a nucleic acid molecule.

The precise nature of the two oligonucleotide primers is critical to thesuccess of the PCR method. As is well known, a molecule of DNA or RNApossesses directionality, which is conferred through the 5'-3' linkageof the phosphate groups. The oligonucleotide primers of the PCR methodare selected to contain sequences identical to, or complementary to,sequences which flank the ACCase nucleic acid sequence whoseamplification is desired.

The DNA molecule of the present invention can be produced through any ofa variety of means, preferably by application of recombinant DNAtechniques. Techniques for synthesizing such molecules are disclosed by,for example, Wu, R., et al. Prog. Nucl. Acid. Res. Molec. Biol. 21:101-141 (1978). Procedures for constructing recombinant molecules inaccordance with the above-described method are disclosed by Sambrook etal., Molecular Cloning: A Laboratory Manual, Second Edition, Cold SpringHarbor Press, Cold Spring Harbor, N.Y. (1989), which reference is hereinincorporated by reference.

PCR and many of its variations are well known in the art. By using PCRwith the primers described below the ACCase gene can be obtained. Bypermitting cycles of polymerization and denaturation, a geometricincrease in the concentration of the ACCase nucleic acid molecule can beachieved which makes the cloning process much easier or at leastpossible. Reviews of the PCR are provided below and thus furtherdiscussion is not necessary. See Mullis, K. B. (Cold Spring Harbor Symp.Quant. Biol. 51: 263-273 (1986)); Saiki, R. K., et al. (Bio/Technology3: 1008-1012 (1985)); and Mullis, K. B., et al. (Meth. Enzymol. 155:335-350 (1987)).

A DNA sequence encoding the ACCase gene of the present invention, or itsfunctional derivatives, may be recombined with vector DNA in accordancewith conventional techniques, including restriction enzyme digestion toprovide appropriate blunt-ended or staggered-ended termini, filling inof cohesive ends as appropriate, alkaline phosphatase treatment to avoidundesirable joining, ligation with appropriate ligases, or the synthesisof fragments by the polymerase chain reaction (PCR). Techniques for suchmanipulations are disclosed by Sambrook et al., supra, and are wellknown in the art.

Once the ACCase gene has been cloned, one may express the gene in a hostcell by ligating it to a vector appropriate for the eventual desiredhost, transferring the vector to the host cell and culturing the hostcell in a manner which permits expression of the gene. Numerous vectors,host cells and techniques for their uses are known per se and arediscussed in many of the references cited in this application.

Intact functional ACCase protein can be made in a number of organisms byproviding a promoter and transcriptional and translational start sites.These genetic elements can be derived from the DNA of other organisms,and it also may be possible to use the genetic elements that naturallyoccur as part of the C. cryptica ACCase gene. Expression levels ofACCase may vary from less than 1% to more than 30% of total cellprotein.

If desired, the non-coding region 3' to the gene sequence coding for theprotein may be obtained by the above-described methods. This region maybe retained for its transcriptional termination regulatory sequences,such as termination and polyadenylation signals. Thus, by retaining the3'-region naturally contiguous to the DNA sequence coding for theprotein, the transcriptional termination signals may be provided. Wherethe transcriptional termination signals are not satisfactorilyfunctional in the expression host cell, then a 3' region functional inthe host cell may be substituted.

Two DNA sequences (such as a promoter region sequence and the ACCasestructural gene sequence) are said to be operably linked if the natureof the linkage between the two DNA sequences does not (1) result in theintroduction of a frame-shift mutation, (2) interfere with the abilityof the promoter region sequence to direct the transcription of theACCase gene sequence, or (3) interfere with the ability of the ACCasegene sequence to be transcribed. A promoter region would be operablylinked to a DNA sequence if the promoter were capable of effectingtranscription of that DNA sequence. Thus, to express the protein,transcriptional and translational signals recognized by an appropriatehost are necessary.

Depending on the host cell, one may wish to use either the naturalACCase promoter or a different promoter. The choice of promoters willdepend on the host cell and the timing and degree of expression desired.For expression in algae, particularly C. cryptica, the natural promoterand regulatory sequences may be used. For expression in differentorganisms, a different promoter is usually preferred. However, in orderto regulate gene expression differently in C. cryptica, one may use adifferent regulatory system which may be artificially modified or mutatethe natural ACCase gene regulatory system.

If the host cell is a bacterium, generally a bacterial promoter andregulatory system will be used. For a typical bacterium such as E. coli,representative examples of well known promoters include trc, lac, tac,trp, bacteriophage lambda P_(L), T7 RNA polymerase promoter, etc. Whenthe expression system is yeast, examples of well known promotersinclude: GAL 1/GAL 10, alcohol dehydrogenase (ADH), his3, cycI, etc. Foreukaryotic hosts, enhancers such as the yeast Ty enhancer may be used.

For multicellular organisms, one has additional concerns with expressionof the ACCase gene in certain tissues as well as the timing ofexpression. The choice of promoter is dependant on the eventual use. Insuch a situation, it may be advantageous to use tissue- or developmentalstage- regulated regulatory elements.

For example, if one wished to increase the lipid content of oilseeds,one would use the ACCase structural gene and a promoter which is activein seed development. Expression need not occur at any other location inthe plant. Examples include the promoters to seed storage proteins suchas phaseolin, napin, oleosin, glycinin, cruciferin, etc. An example ofone such promoter, soybean betaconglycinin, is described by Beachy etal, EMBO J. 4: 3047-3053 (1985).

Alternatively, if one wished for the ACCase to be expressed at only aparticular time, such as after the culture or host organism has reachedmaturity, an externally regulated promoter is particularly useful.Examples include those based upon the nutritional content of the medium(e.g. lac, trp, his), temperature regulation (e.g. temperature sensitiveregulatory elements), heat shock promoters (e.g. HSPSOA, U.S. Pat. No.5,187,267), stress response (e.g. plant EF1A promoter, U.S. Pat. No.5,177,011) and chemically inducible promoters (e.g. tetracyclineinducible promoter or salicylate inducible promoter U.S. Pat. No.5,057,422).

In certain uses, such as making a host resistant to herbicides byexpressing the ACCase gene, one may wish for the ACCase gene expressionto be continuous and in multiple tissue types. Representative examplesof constitutive promoters include the Cauliflower Mosaic Virus 35Spromoter (Odell et al, Nature 313: 810-812 (1985); Bevan et al, EMBO J.4: 1921-1926 (1985)) and its enhancer (Simpson et al, Nature 323:551-554 (1986)), mannopine synthetase promoter (U.S. Pat. No.5,106,739), nopaline synthetase promoter (Bruce et al, Mol. Cell. Biol.7: 59 (1987)), the T_(L) DNA of an Ri plasmid and the OCS promoter andenhancer (Ellis et al, EMBO J. 6: 11 (1987)).

Other promoters of somewhat narrower host range may also be used such aswheat promoters (U.S. Pat. No. 5,139,954) and the ribulose1,5-biphosphate carboxylase promoter (U.S. Pat. No. 4,962,028).

The selection of promoters, enhancers and regulatory elements of allkinds is readily determinable. While not every combination will besuccessful and not every successful combination will be appropriate forall uses, the choice among known systems is easily determined by thoseskilled in the art. To further optimize ACCase gene expression, one maymutate the regulatory elements to eliminate or modify one of theactivities.

Some promoters are applicable in multiple hosts such as the soybean heatshock promoter being expressed by sunflower (Schoffl et al, EMBO J. 4:1119-1124 (1985)). Intracellular plant parasites such as viruses orbacteria typically have promoters recognized by a wider range of hostorganisms. For example, the Cauliflower Mosaic Virus 35S promoter andAqrobacterium tumefaciens T-DNA promoters have a very wide host range.However, the host range of many regulatory elements is limited to onlyone or a few species.

Enhancers are usually critical to tissue specific expression of aparticular gene. By using the corresponding promoter and enhancer, onemay direct synthesis of ACCase to any plant tissue so desired. Forexample if higher oil seeds are desired, a seed specific enhancer may behelpful. Likewise for preparing herbicide resistance from a herbicidewhich inhibits normal plant ACCase but not C. cryptica ACCase,expression in all tissues, or at least tissues exposed to the herbicidesuch as leaves and stems, is desirable.

Vectors, including expression vectors, may be transferred into a cell bya variety of techniques depending on the host cell. For bacteria, thevector may be added to the host cell by transformation which is wellknown per se. Generally, recombinant DNA techniques are performed inbacteria for simplicity.

The same techniques can be used when the host cell is a yeast, fungus,alga or plant cell. Before attempting to transform yeast cells, areplicon for yeast needs to be added to the vector. The previousbacterial replicon need not be removed thereby permitting the plasmid tobe shuttled between both organisms in what is called a "shuttle vector".

For transference of a vector to plants, a virus, T-DNA or physicaltechniques are generally used. As with bacteriophages, plant viruses maybe designed to carry foreign DNA by techniques known per se.Agrobacterium tumefaciens is a bacterium which infects many plants andinserts a segment of DNA called T-DNA into the plant genome. By removingunnecessary genes from the T-DNA and adding the ACCase gene of thepresent invention, the A. tumefaciens carrying the ACCase gene caninfect and transfer the gene to a plant host. The techniques for suchDNA transfer are known per se. Furthermore, the DNA can be placed insidea plant cell by physical means such as microinjection and more recentlyby adsorbing the vector onto small particles and propelling or"shooting" them into plant cells or tissue. Use of these recenttechniques to transform plants as diverse as maize, soybeans and pinetrees are disclosed in U.S. Pat. Nos. 5,015,580 and 5,122,466.

Once plant cells have been transformed with foreign DNA, they may bereproduced and, if not already an entire plant, regenerated into a wholeplant. One such example in soybeans is U.S. Pat. No. 5,024,944. Otherexamples include regeneration of monocotyledonous plants (U.S. Pat. No.5,187,073) and particularly corn (U.S. Pat. No. 5,177,010). Whole plantsmay then reproduce and be bred by conventional plant breedingtechniques, some of which have been used for thousands of years.

In some cases, the transformed cells of a host may be selected for basedupon the newly acquired property of herbicide or antibiotic resistance.As such the ACCase gene of the present invention may be used as aselectable marker for detecting transformation. The ACCase gene may alsobe used as a reporter gene for which a number of promoters or regulatoryregions may be added in order to assay for a promoter or to discoveradditional gene regulators. The choice of host cell for these functionsis limited only to those that naturally contain an ACCase that issensitive to compounds that have no pronounced effects on the activityof the C. cryptica ACCase.

ACCase from many monocotyledonous plants is strongly inhibited byseveral herbicides, particularly the aryloxyphenoxypropionate andcyclohexanedione herbicides (Burton et al, Biochem. Biophys. Res.Commun. 148: 1039-1044 (1987)). The mechanism of action of these classesof herbicides is by inhibiting the activity of ACCase. ACCase from C.cryptica is not strongly inhibited by these herbicides. Thus, theincorporation and expression of this gene into many monocotyledonouscrop plants would be beneficial, as it would allow the use of theseherbicides in fields where monocotyledonous weeds and other susceptibleweeds occur. Examples of desirable monocotyledonous crops include bothagricultural species such as corn, wheat, rice, barley, sugarcane,onion, garlic, asparagus, pineapple, etc. and ornamental plants such asgrass, lily, orchids, narcissus etc. Similarly, this technique may beused for all other plants to make them resistant or more resistant tothe effects of these classes of herbicides.

Techniques for producing herbicide resistance in plants by incorporatingDNA encoding and expressing enzymes resistant to herbicides are known.For example, a different glutamine synthetase gene was added to makeplants resistant to the herbicide phosphinothricen, U.S. Pat. No.5,098,838 and U.S. Pat. No. 5,145,777. In a similar fashion, plants havebeen made resistant to different herbicides by adding foreign DNAencoding Glutathione S-Transferase which detoxifies certain herbicides,e.g. U.S. Pat. No. 5,073,677.

Perhaps the best known of the techniques for preparing a plant with anadded foreign gene imparting herbicide resistance is that of glyphosateresistance (see Comai et al, Nature 313: 741-744 (1985)); U.S. Pat. Nos.4,940,835 and 5,188,642. In this example a chloroplast transit sequenceis added upstream from the herbicide resistance gene so that the proteinproduct is transported into the chloroplasts.

In the same manner, and even using the same techniques and vectors, oneor more copies of the ACCase gene from C. cryptica encoding herbicideresistance may be substituted for one of the other herbicide resistancegenes of the references above. Since ACCase normally performs itsfunction in the chloroplast, it is particularly relevant to use theabove mentioned transit sequence or other plastid transit sequence toensure expression in the chloroplast or other plastid. It may also beadequate or advantageous to express the ACCase gene in the cytoplasm (orendoplasmic reticulum) alone or supplementally. In such a situation, atleast one of the gene construct(s) on the vector would not contain aplastid transit sequence.

Having generated a plant variety with a stable C. cryptica ACCase gene,one can cultivate the plant or plant cells in a conventional manner. Ifthe plant cell is an alga, the gene may optionally be induced accordingto the regulatory regions and the lipids recovered by means conventionalfor recovering lipids from natural algae. If the plant has been designedto overproduce lipids, it may be grown, the ACCase gene induced and thelipids recovered by conventional methods. If the plant expresses theACCase gene of the present invention for the purpose of making the plantresistant to a herbicide, it may be grown in soil (or a soil-lesspotting mix, hydroponic medium etc.) and the herbicide applied toinhibit weeds. For the purposes of this application "soil" is defined asany medium supporting plant growth, such as soil, water (for algae),sand, soil-less potting mixes, hydroponic medium etc.

Current attempts to alter the level of saturated fat content in animalsand animal products have focused on conventional breeding rather than bypreparing transgenic animals. Attempts to generate transgenic animalswith altered lipid content have focused on adding a growth hormone geneto decrease overall fat content of the animal (Palmiter et al, Nature300: 611-615 (1982)). In the present invention, one may add the ACCasegene simultaneously in the same plasmid or separately with therecombinant growth hormone gene in order to produce an animal which willhave an altered ratio of fatty acids in its tissue. Alternatively, theACCase gene may be added alone as the recombinant gene. In this fashion,the meat, milk or eggs from the transgenic animal may have a differentratio of saturated to unsaturated fats.

The ACCase molecule is said to be "substantially similar" to anothermolecule if the sequence of amino acids in both molecules issubstantially the same. Substantially similar ACCase molecules willpossess a similar biological activity. Thus, provided that two moleculespossess a similar activity, they are considered "variants" as that termis used herein even if one of the molecules contains additional aminoacid residues not found in the other, or if the sequence of amino acidresidues is not identical. The ACCase from rat, yeast and E. coli arenot considered substantially similar.

Similarly, a "functional derivative" of the ACCase gene of the presentinvention is meant to include shortened versions of the gene whichencode a functionally equivalent ACCase, "variants," or "analogues" ofthe gene, which are "substantially similar" in amino acid sequence, andwhich encode a molecule possessing similar activity.

The nucleotide sequence may be altered to optimize the sequence for agiven host. Different organisms have different codon preferences as hasbeen reported previously. Furthermore, the nucleotide sequence may bealtered to provide the preferred three dimensional configuration of themRNA produced to enhance ribosome binding and expression. Introns may beremoved from the gene either by restriction endonuclease cleavage orusing the cloned gene as a hybridization probe for conventional cDNAcloning which may be applied to the ACCase gene. Note that the intronsare provided in the sequence recited in the example. Alternatively, thesame or different introns, may be added to the gene at acceptablelocations. Enhancer element(s) may be located in the intron(s).

In the present invention, substantially similar ACCases can be made bychanging the nucleotide sequence to produce a different amino acidsequence. Such changes may be advantageous to change the enzymaticproperties of the ACCase. Alternatively, the change can be made toenhance production of active enzyme, such as changing internal aminoacids to permit cleavage of ACCase from a fusion peptide or to add orsubtract a site for various proteases. See, e.g., Oike, Y., et al., J.Biol. Chem. 257: 9751-9758 (1982); Liu, C., et al., Int. J. Pept.Protein Res. 21: 209-215 (1983). It should be noted that separation ofACCase from a leader sequence is not necessary provided that the ACCaseactivity is sufficiently acceptable.

Furthermore, if the ACCase gene uses a portion of another gene, such asan N-terminal region of said another gene, then it is advantageous toinclude a sequence encoding a cleavage site between said another geneand the ACCase gene. The cleavage site is preferably recognized by oneof the host cell's internal proteases.

Changes to the sequence such as insertions, deletions and site specificmutations can be made by random chemical or radiation inducedmutagenesis, restriction endonuclease cleavage, transposon or viralinsertion, oligonucleotide-directed site specific mutagenesis, or bysuch standard techniques as Botstein et al, Science 229: 193-210 (1985).These techniques are known per se and have been made in a number ofgenes previously. Similar changes have been made in the structural genesencoding other plant enzymes affected by herbicides. One such exampleaffecting glyphosate resistance is shown by U.S. Pat. No. 5,145,783.

Such changes may be made in the present invention to alter the enzymaticactivity, render the enzyme more susceptible or resistant to temperatureor chemicals (including herbicides), alter regulation of the ACCasegene, and to optimize the gene expression for any given host. Thesechanges may be the result of either random changes or changes to aparticular portion of the ACCase molecule believed to be involved with aparticular function.

To further enhance expression, the final host organism may be mutated sothat it will change gene regulation or its production of the ACCase geneproduct.

Unless specifically defined otherwise, all technical or scientific termsused herein have the same meaning as commonly understood by one ofordinary skill in the art to which this invention belongs. Although anymethods and materials similar or equivalent to those described hereincan be used in the practice or testing of the present invention, thepreferred methods and materials are now described.

EXAMPLE

For the experiments below, the strain Cyclotella cryptica T13L wasemployed. This strain was obtained from the Bigelow Laboratory CultureCollection of Marine Phytoplankton, West Boothbay Harbor, Maine. C.cryptica was cultured as described in Roessler, J. Phycol. 24: 394-400(1988).

ACCase from C. cryptica was purified to near homogeneity by means ofammonium sulfate precipitation, gel filtration chromatography, andmonomeric avidin affinity chromatography as described previously(Roessler, Plant Physiol. 92: 73-78 (1990)), and then cleaved by theaddition of CNBr. The peptides were separated by SDS-polyacrylamide gelelectrophoresis, transferred onto a ProBlott membrane (AppliedBiosystems; Foster City, Calif.), and stained with Coomassie Blue.Individual bands were excised for automated sequencing via the Edmandegradation procedure, using an Applied Biosystems 477A proteinsequenator with an on-line 120A PTH analyzer.

Partial amino acid sequences were determined for several peptidesgenerated via CNBr-mediated cleavage of ACCase from C. cryptica. Thesequences of two of these peptides were quite similar to sequences foundin the biotin carboxylase domain of ACCase from rat mammary glands(Lopez-Casillas et al., Proc. Natl. Acad. Sci. U.S.A. 85: 5784-5788(1988)) and chicken liver (Takai et al., J. Biol. Chem. 263: 2651-2657(1988)) and were therefore used to design degenerate oligonucleotidesfor use as PCR primers. A 128-fold degenerate forward polymerase chainreaction (PCR) primer (PR1) and a 256-fold degenerate reverse PCR primer(PR2) were designed based on reverse translations of these two aminoacid sequences. The sequences for the primers are given as follows:

    ______________________________________                                        PR1 = TTYGTNTGGAAYGARGCNGA SEQ ID NO:13                                       PR2 = ACNGCRTTNCCRTGYTGRTC SEQ ID NO:14                                       ______________________________________                                    

25 μl of PCR reaction mixture contained 50 ng DNA from C. cryptica, 0.1μM of each primer species, 10 mM Tris-Cl (pH 8.3), 50 mM KCl, 1 mMMgCl₂, 0.2 mM dNTPs, and 1 U Taq DNA polymerase (Perkin Elmer-Cetus;Norwalk, Conn.). The following thermal cycle was used; Step 1, 94° C.for 5 min; Step 2, 94° C. for 1 min; Step 3, 45° C. for 2 min; Step 4,2° C./sec to 72° C.; Step 5, repeat steps 2 to 4 for 30 times total; andStep 6, 72° C. for 8 min.

Using these primers, a 146-bp fragment was amplified from C. crypticatotal DNA. This fragment was subcloned into the phagemid pBluescript KS+(Stratagene; La Jolla, Calif.) that had been digested with EcoRV. Thededuced amino acid sequence of this fragment exhibited 58% identity withthe corresponding sequence of rat ACCase, thereby confirming that a C.cryptica ACCase gene fragment had been amplified. This sequence is shownbelow:

    __________________________________________________________________________    CYCLOTELLA                                                                              . . . LRNAFVQVSNEVIGSPIFLMQLCKNARHIEVQIVG . . .  SEQ ID NO:15       RAT       . . . FPNLFRQVQAEVPGSPIFVMRLAKQSRHLEVQILA . . . SEQ ID              __________________________________________________________________________              NO:16                                                           

In order to isolate the full-length ACCase gene, a genomic Lambdalibrary was constructed. Total DNA was purified from C. cryptica asdescribed by Jarvis et al. (Jarvis et al., J. Phycol. 28: 356-362(1992)), except that the cells were disrupted in the extraction bufferby gentle inversions instead of by agitation with glass beads. The DNAwas purified from contaminating polysaccharides by the use ofhexadecyltrimethylammonium bromide (CTAB) (Murray et al., Nucleic AcidsRes. 8: 4321-4325 (1980)), and then partially digested with Sau3AI.After partially filling in the overhangs by the addition of dGTP, dATP,and the Klenow fragment of E. coli DNA Polymerase I, the DNA was ligatedto XhoI half-site arms of the Lambda phage derivative LambdaGEM-12(Promega Corp.; Madison, Wis.) according to the manufacturer'sinstructions.

The entire unamplified library (-4×10⁴) was plated out, using E. coliKW251 as the host strain. Plaques were lifted onto nitrocellulosemembrane filters, which were treated with NaOH and neutralized viastandard conditions (Sambrook et al., Molecular Cloning: A LaboratoryManual, 2nd Ed., Cold Spring Harbor Laboratory Press, N.Y. (1989)).After baking in vacuo for 1 h at 80° C., the filters were washed for 10h at 42° C. in 5X SSPE/0.5% SDS, and then prehybridized for 6 h at 42°C. in hybridization solution (7% SDS/30% formamide/2X SSPE). The filterswere then immersed in fresh hybridization solution containing a ³²P-labeled RNA transcript generated in vitro from the subcloned 146-bpPCR product and incubated for 20 h at 42° C. The filters were washed for5 min at 20° C. with 2X SSPE/0.2% SDS (twice), and then once with 1XSSPE/0.2% SDS for 30 min at 50° C. Autoradiograms of the filters weremade with the aid of an enhancement screen (DuPont Cronex, Wilmington,Del.).

Four independent clones were isolated in this manner, and restrictionmapping indicated that all four clones contained common sequences. Thelargest insert (14 kb) was digested separately with EcoRI and BamHI andthe resulting restriction fragments were subcloned into pUCl18 orpBluescript KS+.

These subclones were sequenced by the method of Kraft et al. (Kraft etal, Biotechniques 6: 544-546 (1988)) using a combination of universaland gene-specific primers.

This analysis indicated the presence of two large open reading frames(ORFs) in close proximity to one another; the largest ORF was 4.1 kblong and was immediately downstream from a smaller 2.2-kb ORF.

Comparison of the deduced amino acid sequences of these ORFs to thesequences of animal and yeast ACCase indicated that the 2.2-kb ORFcorresponded to the biotin carboxylase domain of ACCase whereas the4.1-kb ORF contained sequences that could be aligned with the biotincarboxyl carrier protein and carboxyltransferase domains.

The lack of an ORF long enough to encode a 200-kDa polypeptide suggestedthe presence of an intron between the 2.2kb and 4.1-kbORFs. Thispossibility was tested by using the PCR procedure to amplify cDNAgenerated from C. cryptica total RNA, utilizing opposing gene-specificprimers (JO49 and JO63) that annealed to the cDNA on each side of thepredicted intron splicing site. The nucleotide sequence for these twoprimers is as follows.

    ______________________________________                                        JO49 = TGTCCAATTTGCCCGAA SEQ ID NO:17                                         JO63 = TAAAGTTGAGATGCCCT SEQ ID NO:18                                         ______________________________________                                    

For this procedure, total RNA was isolated from C. cryptica cells by amodification of the procedure described by Bascomb et al., PlantPhysiol. 83: 75-84 (1987). The modifications included grinding the cellswith a mortar and pestle in liquid nitrogen, instead of using a Frenchpress, and passing the isolated RNA through a Sigmacell 50 (Sigma; St.Louis, Mo.) column to remove contaminating polysaccharides. Randomlyprimed synthesis of cDNA and subsequent PCR amplification ofACCase-encoding cDNA using ACCase-specific oligonucleotide primers werecarried out by the use of a "GeneAmp" RNA-PCR kit (Perkin Elmer-Cetus).The following PCR thermal cycle was used: Step 1, 94° C. for 2 min; Step2, 94° C. for 1 min; Step 3, 45° C. for 1 min; Step 4, 2° C./sec to 72°C.; Step 5, 72° C. for 1.5 min; Step 6, repeat steps 2 to 5 for 45 timestotal; and Step 7, 72° C. for 10 min. PCR products were gel-purified andsubcloned into the plasmid pCR 1000 (Invitrogen; San Diego, Calif.). E.coli INVαF' cells were transformed with the recombinant plasmids, andplasmid DNA was purified and sequenced as described above.

Sequence analysis of the resulting PCR product confirmed that a 73-bpintron is located approximately 125 bp upstream from the region of thegene that encodes the biotin binding site.

An in-frame translation initiation codon was not present in the firstlarge (2.2-kb) ORF upstream from a region that exhibited strongsimilarity to ACCase sequences from other species. The 5'-RACE procedure("Rapid Amplification of cDNA Ends", Frohman et al., Proc. Natl. Acad.Sci. U.S.A. 85: 8998-9002 (1988)) was used to examine this possibility.5'-RACE was carried out by the use of a kit (BRL-Life Technologies;Gaithersburg, Md.). The primer used for cDNA synthesis was PR10, whileJO66 and the kit-supplied anchor primer were used for PCR amplification.

    ______________________________________                                        PR10 = CCAAACGGCATCAACCC SEQ ID NO:19                                         JO66 = GTTGGCGTAGTTGTTCA. SEQ ID NO:20                                        ______________________________________                                    

The following PCR thermal cycle was used: Step 1, 94° C. for 3 min; Step2, 94° C. for 1 min; Step 3, 45° C. for 1 min; Step 4, 72° C. for 2 min;Step 5, repeat steps 2 to 4 for 40 times total; and Step 6, 72° C. for10 min. RACE products were digested with SpeI (which cleaves within theanchor primer) and KpnI (which cleaves within the coding region of theACCase gene), gel-purified, and subcloned into SpeI/KpnI-digestedpBluescript KS+. E coli DH5αF' cells were transformed with therecombinant plasmids, and transformants were screened with a labeled DNAprobe specific for the 5' end of the ACCase gene. The plasmidscontaining the largest inserts were sequenced as described above.

The longest RACE product obtained indicated the presence of a 447-bpintron. However, the amplified DNA did not extend in the 5' directionfar enough to include a potential initiation codon, although analysis ofthe genomic sequence indicated that an in-frame ATG codon was presentless than 50 bp upstream from the 5' end of the RACE clone. Therefore, aforward PCR primer (PR19) having a sequence of:

    ______________________________________                                        PR19 = GCATTTCCTCACGATAG SEQ ID NO:21                                         ______________________________________                                    

that annealed slightly upstream from this putative initiation codon wasused along with a reverse primer (J066) that annealed downstream fromthe 447-bp intron to amplify cDNA generated from total RNA.

An intron-free ACCase gene fragment was obtained by this procedure, andsince an in-frame stop codon is present in the cDNA only 15 bp upstreamfrom the putative ATG initiation site, this ATG appears to represent thetrue translation initiation codon. Removal of the 73-bp and 447-bpintrons yields an ORF of nearly 6.3 kb. Additional RNA-PCR experimentsusing primer pairs bracketing other regions of the ACCase gene have notindicated the presence of other introns.

The DNA sequence from start codon to stop codon including introns is asfollows. The introns are represented by being in lower case.

    __________________________________________________________________________    ATGGCTCTCCGTAGGGGCCTTTACGCTGCTGCAGCGACTGCCATCTTGGTCACGGCTTCAGT                GACCGCTTTTGgtaagtctgcatttggattgatggttagcattccccacgagcagcatgttg                tgttacgcgttgttgcgtagtgtcagttgtgataattatgatcgacaagaatgggaggactc                tttttgtatcgtttgtagagtgttacactggaccttcgcctaaacacgtttggaggtcctca                catccgcgacgagagctcccacatttcatctacatctctacgtgagcgaatttacgtcacct                ggctattcatttgaggtcccttcctcccacgtgcttccatgttccttagggcgcttaagcat                agttgcacttggagcacttgttgtcaaattgtcgtgtacccgtcactttcgaagcgttattt                ggggttggctggtcctatttaaacagaaattattacgatgtttcgctaacgattctttctct                cattttttaacctacacgaaacagCTCCTCAGCATTCGACATTCACCCCCCAATCGCTCTCG                GCGGCACCCACGCGCAACGTCTTCGGCCAGATCAAAAGCGCCTTCTTCAACCATGATGTTGC                CACCTCTCGAACCATTCTTCACGCCGCGACACTAGATGAAACTGTTCTTTCCGCTTCAGACT                CCGTCGCCAAATCTGTCGAAGACTACGTGAAATCCCGTGGTGGAAATCGCGTCATTCGTAAA                GTCCTCATCGCCAACAACGGCATGGCCGCGACAAAGTCCATCCTCTCCATGCGTCAATGGGC                CTACATGGAATTCGGGGACGAACGTGCCATCCAGTTCGTTGCGATGGCGACTCCCGAGGATT                TGAAGGCGAACGCCGAATTTATTCGCTTGGCGGATTCTTTCGTCGAGGTACCGGGAGGAAAG                AACTTGAACAACTACGCCAACGTCGATGTCATTACCCGCATCGCTAAGGAGCAGGGGGTTGA                TGCCGTTTGGCCTGGATGGGGTCATGCATCTGAGAATCCGAAGCTCCCTAATGCGCTTGACA                AATTGGGAATCAAGTTCATTGGACCAACTGGGCCTGTCATGAGCGTTTTGGGAGACAAGATT                GCTGCGAACATTCTAGCACAGACAGCGAAAGTCCCCTCCATTCCCTGGAGTGGATCCTTTGG                TGGACCAGACGATGGACCCCTTCAGGCGGATCTGACCGAGGAGGGTACTATCCCAATGGAAA                TCTTTAACAAGGGATTAGTAACCTCTGCTGATGAAGCCGTCATTGTGGCGAACAAGATTGGC                TGGGAGAACGGAATCATGATCAAGGCTTCTGAGGGTGGAGGAGGAAAGGGTATACGCTTTGT                CGACAATGAGGCCGACTTACGGAACGCGTTCGTTCAGGTGTCCAATGAAGTGATTGGCTCTC                CTATTTTCCTCATGCAGTTGTGTAAGAACGCTCGTCACATCGAAGTGCAAATTGTTGGCGAC                CAGCACGGAAATGCTGTAGCGTTGAACGGTCGAGATTGCTCCACTCAGCGTCGCTTCCAGAA                GATCTTCGAGGAAGGTCCTCCGTCCATTGTACCGAAAGAAACATTCCACGAGATGGAACTTG                CGGCTCAACGGTTGACTCAAAACATTGGGTATCAAGGTGCTGGAACTGTGGAATACTTGTAC                AACGCCGCTGACAATAAGTTTTTCTTCCTTGAGTTGAACCCCCGTCTCCAAGTGGAGCATCC                TGTGACTGAAGGAATTACCGGCGCTAATCTTCCTGCCACTCAGCTTCAAGTTGCTATGGGTA                TTCCTCTCTTCAACATTCCTGACATTCGCCGTCTCTATGGAAGAGAGGATGCTTACGGAACG                GATCCCATTGATTTTCTTCAAGAACGTTACCGCGAACTCGACTCTCATGTAATTGCTGCCCG                CATCACTGCTGAAAACCCCGATGAAGGATTCAAACCCACCTCAGGCTCAATTGAGCGAATCA                AATTTCAATCCACCCCAAATGTTTGGGGATATTTCTCTGTTGGTGCTAACGGTGGAATCCAT                GAATTTGCCGACTCTCAGTTTGGCCATCTTTTCGCTAAGGGTCCGAACCGTGAGCAAGCCCG                CAAGGCATTGGTTTTGGCTCTTAAGGAGATGGAAGTGCGCGGAGACATTCGTAACTCTGTTG                AATACCTAGTCAAGTTGCTCGAAACTGAAGCTTTCAAGAAGAACACTATCGACACGTCTTGG                TTAGATGGCATTATTAAGGAGAAGTCCGTTAAAGTTGAGATGCCCTCTCACTTAGTGGTTGT                CGGAGCCGCTGTTTTCAAGGCCTTCGAACATGTTAAGGTGGCCACTGAAGAAGTTAAGGAAT                CGTTTCGAAAAGGACAAGTCTCCACTGCAGGGATTCCAGGCATAAACTCGTTCAACATCGAA                GTTGCGTACTTAGACACGAAGTACCCATTCCACGTAGAACGGATCTCTCCAGATGTTTACAG                GTTTACCTTGGACGGGAACACGATTGATGTGGAAGTTACCCAAACCGCTGAAGGAGCACTTT                TGGCAACCTTTGGAGGAGAGACTCATCGTATCTTTGGTATGGACGAACCACTTGGCCTTCGA                CTGTCATTGGACGGGGCAACTGTCCTAATgtaagttgtctgtccctcgatgtcgctgtttca                tctgtagtcaagtatcctcaccttatgtacttattcgtagGCCAACAATTTTTGACCCCTCT                GAACTCCGCACTGATGTGACTGGAAAGGTTGTTCGTTACCTCCAAGACAATGGAGCAACTGT                TGAAGCGGGCCAGCCCTATGTCGAGGTTGAAGCGATGAAGATGATCATGCCAATCAAGGCTA                CTGAGTCTGGAAAAATTACTCACAACCTAAGTGCTGGATCTGTAATCTCTGCTGGTGACCTT                CTTGCTTCTCTCGAACTTAAGGATCCCTCTAGGGTTAAGAAAATAGAAACTTTTTCGGGCAA                ATTGGACATTATGGAATCGAAGGTTGACTTAGAACCGCAGAAAGCAGTCATGAATGTCCTCT                CTGGGTTCAACTTAGACCCTGAGGCAGTTGCGCAGCAAGCAATTGACAGTGCTACCGACAGC                TCTGCCGCAGCCGATCTTCTTGTCCAAGTATTAGACGAATTCTATCGCGTTGAATCTCAGTT                TGATGGTGTCATCGCTGATGATGTTGTCCGCACTCTCACCAAAGCGAACACCGAGACACTTG                ATGTTGTCATCTCCGAGAACTTGGCCCACCAGCAGCTCAAGAGGCGTAGTCAGCTTCTCCTC                GCTATGATCCGTCAACTTGACACGTTTCAAGACAGATTTGGCAGAGAAGTTCCGGATGCTGT                CATTGAAGCATTGAGTAGGCTTTCTACCTTGAAAGACAAATCTTACGGTGAAATCATTCTTG                CGGCTGAGGAGAGAGTCCGCGAAGCCAAGGTGCCGTCCTTCGAAGTGCGTCGTGCTGATTTG                CGTGCAAAGCTTGCTGACCCGGAGACAGATTTGATTGACCTGAGTAAGAGCTCAACACTCTC                AGCAGGGGTTGACCTTCTCACAAATCTTTTTGATGACGAAGATGAATCTGTCCGCGCTGCTG                CTATGGAAGTATATACTCGCCGTGTCTACCGTACCTACAACATCCCCGAGCTAACTGTTGGA                GTTGAGAATGGCCGCCTCTCATGTAGCTTCTCCTTCCAATTTGCTGATGTCCCGGCGAAAGA                CCGTGTCACCCGCCAAGGGTTCTTCTCAGTTATCGACGACGCTTCAAAGTTCGCGCAACAGC                TTCCTGAGATTCTCAACTCGTTTGGATCAAAGATCGCAGGGGATGCAAGCAAAGAAGGCCCT                GTCAATGTTTTGCAGGTTGGTGCTCTCTCGGGAGATATCAGTATTGAGGACCTCGAGAAAGC                TACTTCCGCTAACAAGGACAAGTTGAATATGCTTGGTGTCCGCACTGTGACGGCTCTTATCC                CAAGGGGAAAGAAGGACCCAAGCTATTATTCATTCCCCCAATGCAGTGGCTTCAAGGAGGAT                CCTCTTCGCAGAGGCATGCGCCCAACCTTTCATCATCTCCTGGAACTCGGACGGCTGGAGGA                AAACTTTGCTCTTGAACGAATTCCTGCAGTTGGACGCAACGTACAGATTTATGTTGGTTCCG                AGAAGACGGCAAGGCGAAATGCAGCTCAAGTTGTTTTCTTGAGAGCTATCTCACATACTCCT                GGCCTAACTACCTTCTCTGGTGCACGCCGAGCTCTTCTCCAGGGGCTTGACGAATTGGAACG                TGCTCAAGCAAACTCAAAGGTCAGTGTCCAGTCATCGTCTCGCATCTACCTTCACTCTCTCC                CAGAACAGTCTGATGCAACTCCCGAGGAGATTGCTAAAGAATTCGAAGGTGTCATTGACAAG                CTAAAGAGTCGATTGGCCCAACGTCTTACGAAACTGCGTGTGGATGAGATTGAAACCAAGGT                TCGCGTGACTGTCCAGGATGAAGACGGTAGTCCCAGGGTTGTGCCTGTACGCCTTGTGGCTT                CTTCAATGCAAGGCGAATGGCTTAAAACATCTGCTTACATTGATCGTCCGGACCCGGTCACT                GGAGTCACCCGTGAACGGTGCGTGATTGGAGAAGGCATTGACGAGGTTTGTGAACTTGAGTC                GTATGACTCTACCAGTACCATCCAAACAAAGCGCTCAATTGCAAGACGTGTGGGATCTACCT                ACGCTTATGACTACCTTGGACTCCTTGAGGTCAGCTTGCTTGGAGAATGGGATAAGTATCTC                AGCAGTCTCTCAGGACCGGACACCCCTACCATCCCGTCGAATGTTTTTGAAGCTCAAGAGTT                ACTTGAAGGACCTGATGGCGAGCTTGTCACCGGGAAACGTGAAATTGGAACAAATAAGGTTG                GTATGGTTGCATGGGTGGTAACAATGAAAACACCTGAATATCCTGAGGGTCGACAGGTTGTT                GTAATTGTGAACGATGTCACTGTACAAAGTGGTTCATTTGGAGTTGAGGAGGATGAAGTTTT                CTTCAAGGCCTCCAAATATGCTCGCGAAAATAAGCTCCCCCGTGTCTACATTGCGTGCAACT                CTGGTGCTAGAATTGGTTTGGTGGATGATCTCAAGCCAAAGTTCCAGATCAAATTCATTGAT                GAGGCGAGTCCATCTAAGGGTTTTGAGTACCTTTATCTTGATGATGCAACGTACAAATCTCT                TCCAGAAGGGTCGGTAAATGTAAGGAAGGTCCCTGAAGGCTGGGCTATCACTGATATCATTG                GAACGAACGAAGGAATTGGGGTTGAGAACCTTCAAGGAAGTGGCAAAATTGCTGGCGAGACA                TCAAGGGCATATGATGAAATCTTCACCTTGAGTTACGTCACAGGTAGAAGTGTTGGTATTGG                AGCTTACCTTGTCCGTCTCGGCCAGCGTATTATTCAGATGAAACAAGGACCCATGATTCTCA                CAGGCTATGGTGCCCTGAATAAGCTTCTCGGCCGTGAAGTGTACAACTCAAACGACCAACTT                GGTGGTCCTCAAGTCATGTTCCCAAACGGCTGCTCTCATGAAATTGTAGATGATGACCAACA                AGGCATCCAGTCCATTATCCAATGGCTAAGCTTTGTTCCCAAGACAACTGATGCTGTGTCAC                CCGTCCGTGAATGTGCCGACCCTGTCAACAGGGATGTTCAATGGCGCCCTACCCCCACTCCT                TATGATCCACGCCTCATGCTCTCAGGAACTGACGAGGAACTCGGTTTTTTTGACACAGGAAG                CTGGAAGGAATATCTTGCTGGCTGGGGGAAGAGTGTTGTTATTGGCCGCGGTCGCCTTGGTG                GCATTCCTATGGGTGCTATTGCCGTGGAGACCCGGCTTGTTGAGAAGATTATCCCTGCAGAT                CCAGCAGACCCCAACTCCCGCGAAGCTGTCATGCCCCAGGCTGGACAAGTTCTTTTCCCTGA                CTCATCCTACAAGACAGCCCAAGCTCTCCGCGACTTTAATAACGAGGGCCTCCCTGTGATGA                TTTTCGGCAACTGGCGTGGATTTAGTGGTGGAAGTCGTGACATGTCTGGTGAAATCCTCAAA                TTTGGATCCATGATTGTCGATTCACTCCGAGAGTACAAACATCCTATTTACATATACTTCCC                TCCATATGGTGAACTTCGAGGAGGATCGTGGGTTGTGGTGGACCCCACTATCAATGAGGACA                AGATGACCATGTTCTCAGATCCTGATGCTCGTGGTGGTATTCTCGAACCTGCTGGTATTGTA                GAAATCAAGTTCCGCTTGGCAGACCAGCTGAAAGCCATGCACCGCATTGATCCCCAGCTGAA                GATGCTAGATTCAGAGCTTGAGTCGACAGACGACACAGATGTCGCTGCTCAAGAAGCAATCA                AAGAGCAGATTGCTGCAAGAGAGGAGCTTCTTAAACCCGTCTATCTTCAGGCTGCTACTGAA                TTTGCTGATCTCCACGACAAGACGGGACGGATGAAGGCGAAGGGTGTTATCAAAGAAGCAGT                TCCATGGGCTCGCTCTCGTGAATACTTCTTTTATCTTGCTAAGCGCCGCATTTTTCAAGACA                ACTATGTGTTGCAAATCACTGCTGCTGATCCTTCGTTAGACTCTAAGGCTGCTCTTGAGGTG                TTGAAGAACATGTGCACTGCAGACTGGGATGACAACAAAGCCGTTCTTGACTATTATCTGTC                CAGCGATGGAGACATCACAGCCAAGATTAGCGAGATGAAGAAGGCAGCTATCAAGGCACAGA                TCGAGCAGCTTCAGAAAGCTTTGGAGGGTTGA SEQ ID NO:22                                 __________________________________________________________________________

The deduced amino acid sequence for the corresponding ACCase protein is:

    __________________________________________________________________________    MALRRGLYAAAATAILVTASVTAFAPQHSTFTPQSLSAAPTRNVFGQIKSAFFNHDVATSRT                ILHAATLDETVLSASDSVAKSVEDYVKSRGGNRVIRKVLIANNGMAATKSILSMRQWAYMEF                GDERAIQFVAMATPEDLKANAEFIRLADSFVEVPGGKNLNNYANVDVITRIAKEQGVDAVWP                GWGHASENPKLPNALDKLGIKFIGPTGPVMSVLGDKIAANILAQTAKVPSIPWSGSFGGPDD                GPLQADLTEEGTIPMEIFNKGLVTSADEAVIVANKIGWENGIMIKASEGGGGKGIRFVDNEA                DLRNAFVQVSNEVIGSPIFLMQLCKNARHIEVQIVGDQHGNAVALNGRDCSTQRRFQKIFEE                GPPSIVPKETFHEMELAAQRLTQNIGYQGAGTVEYLYNAADNKFFFLELNPRLQVEHPVTEG                ITGANLPATQLQVAMGIPLFNIPDIRRLYGREDAYGTDPIDFLQERYRELDSHVIAARITAE                NPDEGFKPTSGSIERIKFQSTPNVWGYFSVGANGGIHEFADSQFGHLFAKGPNREQARKALV                LALKEMEVRGDIRNSVEYLVKLLETEAFKKNTIDTSWLDGIIKEKSVKVEMPSHLVVVGAAV                FKAFEHVKVATEEVKESFRKGQVSTAGIPGINSFNIEVAYLDTKYPFHVERISPDVYRFTLD                GNTIDVEVTQTAEGALLATFGGETHRIFGMDEPLGLRLSLDGATVLMPTIFDPSELRTDVTG                KVVRYLQDNGATVEAGQPYVEVEAMKMIMPIKATESGKITHNLSAGSVISAGDLLASLELKD                PSRVKKIETFSGKLDIMESKVDLEPQKAVMNVLSGFNLDPEAVAQQAIDSATDSSAAADLLV                QVLDEFYRVESQFDGVIADDVVRTLTKANTETLDVVISENLAHQQLKRRSQLLLAMIRQLDT                FQDRFGREVPDAVIEALSRLSTLKDKSYGEIILAAEERVREAKVPSFEVRRADLRAKLADPE                TDLIDLSKSSTLSAGVDLLTNLFDDEDESVRAAAMEVYTRRVYRTYNIPELTVGVENGRLSC                SFSFQFADVPAKDRVTRQGFFSVIDDASKFAQQLPEILNSFGSKIAGDASKEGPVNVLQVGA                LSGDISIEDLEKATSANKDKLNMLGVRTVTALIPRGKKDPSYYSFPQCSGFKEDPLRRGMRP                TFHHLLELGRLEENFALERIPAVGRNVQIYVGSEKTARRNAAQVVFLRAISHTPGLTTFSGA                RRALLQGLDELERAQANSKVSVQSSSRIYLHSLPEQSDATPEEIAKEFEGVIDKLKSRLAQR                LTKLRVDEIETKVRVTVQDEDGSPRVVPVRLVASSMQGEWLKTSAYIDRPDPVTGVTRERCV                IGEGIDEVCELESYDSTSTIQTKRSIARRVGSTYAYDYLGLLEVSLLGEWDKYLSSLSGPDT                PTIPSNVFEAQELLEGPDGELVTGKREIGTNKVGMVAWVVTMKTPEYPEGRQVVVIVNDVTV                QSGSFGVEEDEVFFKASKYARENKLPRVYIACNSGARIGLVDDLKPKFQIKFIDEASPSKGF                EYLYLDDATYKSLPEGSVNVRKVPEGWAITDIIGTNEGIGVENLQGSGKIAGETSRAYDEIF                TLSYVTGRSVGIGAYLVRLGQRIIQMKQGPMILTGYGALNKLLGREVYNSNDQLGGPQVMFP                NGCSHEIVDDDQQGIQSIIQWLSFVPKTTDAVSPVRECADPVNRDVQWRPTPTPYDPRLMLS                GTDEELGFFDTGSWKEYLAGWGKSVVIGRGRLGGIPMGAIAVETRLVEKIIPADPADPNSRE                AVMPQAGQVLFPDSSYKTAQALRDFNNEGLPVMIFANWRGFSGGSRDMSGEILKFGSMIVDS                LREYKHPIYIYFPPYGELRGGSWVVVDPTINEDKMTMFSDPDARGGILEPAGIVEIKFRLAD                QLKAMHRIDPQLKMLDSELESTDDTDVAAQEAIKEQIAAREELLKPVYLQAATEFADLHDKT                GRMKAKGVIKEAVPWARSREYFFYLAKRRIFQDNYVLQITAADPSLDSKAALEVLKNMCTAD                WDDNKAVLDYYLSSDGDITAKISEMKKAAIKAQIEQLQKALEG SEQ ID NO:23                      __________________________________________________________________________

The experimentally determined amino acid sequences are underlined below.Sequences used for design of the PR1 and PR2 PCR primers are doubleunderlined.

    __________________________________________________________________________    MALRRGLYAAAATAILVTASVTAFAPQHSTFTPQSLSAAPTRNVFGQIKSAFFNHDVATS                                                                     60                         RTILHAATLDETVLSASDSVAKSVEDYVKSRGGNRVIRKVLIANNGMAATKSILSMRQWA                                                                     120                        YMEFGDERAIQFVAMATPEDLKANAEFIRLADSFVEVPGGKNLNNYANVDVITRIAKEQG                                                                     180                        VDAVWPGWGHASENPKLPNALDKLGIKFIGPTGPVMSVLGDKIAANILAQTAKVPSIPWS                                                                     240                        GSFGGPDDGPLQADLTEEGTIPMEIFNKGLVTSADEAVIVANKIGWENGIM IKASEGGGG                                                                    300                         KGIR FVDNEAD LRNAFVQVSNEVIGSPIFLM QLCKNARHIEVQIVGDQ HGNAVA LNGRDC                                                               360                        STQRRFQKIFEEGPPSIVPKETFHEMELAAQRLTQNIGYQGAGTVEYLYNAADNKFFFLE                                                                     420                        LNPRLQVEHPVTEGITGANLPATQLQVAMGIPLFNIPDIRRLYGREDAYGTDPIDFLQER                                                                     480                        YRELDSHVIAARITAENPDEGFKPTSGSIERIKFQSTPNVWGYFSVGANGGIHEFADSQF                                                                     540                        GHLFAKGPNREQARKALVLALKEMEVRGDIRNSVEYLVKLLETEAFKKNTIDTSWLDGII                                                                     600                        KEKSVKVEM PSHLVVVGAAVFKAFEHVKVATEEVKESFRKGQVSTAGIPGINSFNIEVAY                                                                    660                        LDTKYPFHVERISPDVYRFTLDGNTIDVEVTQTAEGALLATFGGETHRIFGMDEPLGLRL                                                                     720                        SLDGATVLMPTIFDPSELRTDVTGKVVRYLQDNGATVEAGQPYVEVEAMKMIMPIKATES                                                                     780                        GKITHNLSAGSVISAGDLLASLELKDPSRVKKIETFSGKLDIMESKVDLEPQKAVM                                                                         840S                        GFNLDPEAVAQQAIDSATDSSAAADLLVQVLDEFYRVESQFDGVIADDVVRTLTKANTET                                                                    900                        LDVVISENLAHQQLKRRSQLLLAM IRQLDTFQDRFGREVPDAVIEALSRLSTLKDKSYGE                                                                    960                        IILAAEERVREAKVPSFEVRRADLRAKLADPETDLIDLSKSSTLSAGVDLLTNLFDDEDE                                                                     1020                       SVRAAAMEVYTRRVYRTYNIPELTVGVENGRLSCSFSFQFADVPAKDRVTRQGFFSVIDD                                                                     1080                       ASKFAQQLPEILNSFGSKIAGDASKEGPVNVLQVGALSGDISIEDLEKATSANKDKLNM                                                                      1140                        GVRTVTALIPRGKKDPSYYSFPQCSGFKEDPLRRGMRPTFHHLLELGRLEENFALERIPA                                                                    1200                       VGRNVQIYVGSEKTARRNAAQVVFLRAISHTPGLTTFSGARRALLQGLDELERAQANSKV                                                                     1260                       SVQSSSRIYLHSLPEQSDATPEEIAKEFEGVIDKLKSRLAQRLTKLRVDEIETKVRVTVQ                                                                     1320                       DEDGSPRVVPVRLVASSMQGEWLKTSAYIDRPDPVTGVTRERCVIGEGIDEVCELESYDS                                                                     1380                       TSTIQTKRSIARRVGSTYAYDYLGLLEVSLLGEWDKYLSSLSGPDTPTIPSNVFEAQELL                                                                     1440                       EGPDGELVTGKREIGTNKVGMVAWVVTMKTPEYPEGRQVVVIVNDVTVQSGSFGVEEDEV                                                                     1500                       FFKASKYARENKLPRVYIACNSGARIGLVDDLKPKFQIKFIDEASPSKGFEYLYLDDATY                                                                     1560                       KSLPEGSVNVRKVPEGWAITDIIGTNEGIGVENLQGSGKIAGETSRAYDEIFTLSYVTGR                                                                     1620                       SVGIGAYLVRLGQRIIQMKQGPMILTGYGALNKLLGREVYNSNDQLGGPQVMFPNGCSHE                                                                     1680                       IVDDDQQGIQSIIQWLSFVPKTTDAVSPVRECADPVNRDVQWRPTPTPYDPRLMLSGTDE                                                                     1740                       ELGFFDTGSWKEYLAGWGKSVVIGRGRLGGIPM GAIAVETRLVEKIIPADPADPNSREAV                                                                    1800                       M PQAGQVLFPDSSYKTAQALRDFNNEGLPVMIFANWRGFSGGSRDMSGEILKFGSMIVDS                                                                    1860                       LREYKHPIYIYFPPYGELRGGSWVVVDPTINEDKMTMFSDPDARGGILEPAGIVEIKFRL                                                                     1920                       ADQLKAMHRIDPQLKMLDSELESTDDTDVAAQEAIKEQIAAREELLKPVYLQAATEFADL                                                                     1980                       HDKTGRMKAKGVIKEAVPWARSREYFFYLAKRRIFQDNYVLQITAADPSLDSKAALEVLK                                                                     2040                       NMCTADWDDNKAVLDYYLSSDGDITAKISEMKKAAIKAQIEQLQKALEG  2089                       SEQ ID NO:24                                                                  __________________________________________________________________________

GENE ANALYSIS

The ACCase polypeptide from C. cryptica is predicted to be composed of2089 amino acids and to have an unglycosylated molecular mass of 229,836daltons before any post translational modification. Previous researchhas indicated that C. cryptica ACCase co-migrates with myosin inSDS-PAGE gels, therefore the molecular mass of the polypeptide waspreviously estimated to be 185 to 200 kDa (Roessler, Plant Physiol. 92:73-78 (1990)). This discrepancy is most likely attributable toinaccurate size estimation by SDS-PAGE or by post-translational cleavageof the protein. The N-terminal sequence of the predicted protein hascharacteristics of a signal sequence, with two positively chargedarginine residues within the first five amino acids of the polypeptide,followed by a hydrophobic region (von Heijne, J. Membrane Biol. 115:195-201 (1990)).

In eukaryotes, signal sequences direct proteins into the endoplasmicreticulum (ER). Signal sequences have also been shown to be necessaryfor transport of nuclear-encoded proteins into the chloroplasts ofdiatoms (Bhaya et al., Mol. Gen. Genet. 229: 400-404 (1991)). Thisobservation is consistent with the fact that diatom chloroplasts arecompletely enclosed by closely expressed ER membranes (Gibbs, J. Cell.Sci. 35: 253-266 (1979)). Fatty acid biosynthesis occurs primarily inthe plastids of higher plants (Harwood, Ann. Rev. Plant Physiol. PlantMol. Biol. 39: 101-138 (1988)). It is assumed that ACCase is located inthe chloroplasts of diatoms, and therefore a signal sequence may benecessary for chloroplast targeting. Alternatively, it is possible thatthe cloned gene of the present invention is an ER-localized isoform ofACCase.

Diatoms produce substantial quantities of C₂₀ and C₂₂ fatty acids(primarily eicosapentaenoic acid and docosahexaenoic acid). In higherplants and diatoms, elongation of fatty acids to lengths greater than 18carbons occurs within the ER, implicating the need for malonyl-CoA inthis cellular compartment. (Harwood, Ann. Rev. Plant Physiol. Plant Mol.Biol. 39: 101-138 (1988); Schreiner et al., Plant. Physiol. 96(S): 14(1991)), However, malonyl-CoA is not able to pass through thechloroplast envelope, and therefore either an additional ACCase isoformexists outside of the chloroplast or there must be an alternative meansof malonyl-CoA synthesis or transport. Accordingly, the presentinvention encompasses expressing the ACCase gene with and/or without asignal sequence to transport the enzyme into a plastid.

It should be noted, however, that the ACCase which was used in theExample for amino acid sequencing (and subsequent PCR primer design) wasby far the most abundant ACCase in C. cryptica under thepurification/assay conditions that were employed. It therefore appearslikely that the cloned gene sequence recited above is for an ACCase thatis responsible for chloroplastic fatty acid biosynthesis.

In order to test for the possible presence of compartment-specificACCase isoforms, Southern blots of C. cryptica total DNA that had beendigested with five different restriction enzymes were probed with theACCase-encoding 146-bp PCR product described above. Total DNA (10 μg)isolated from C. cryptica was digested for 18 h at 37° C. with 40 unitsor either EcoRI, EcoRV, HindIII, PstI, or SacI. Agarose gelelectrophoresis and alkaline blotting were carried out under standardconditions (Sambrook et al., Molecular Cloning: A Laboratory Manual, 2ndEd., Cold Spring Harbor Laboratory Press, N.Y. (1989)). Theprehybridization, hybridization, and washing steps were performed asdescribed above for genomic library screening. The results suggest thepresence of a single isoform. If isoforms do exist, the sequences of thegenes must be different enough in this region to preventcross-hybridization under the conditions utilized. The fact that ACCasemust pass through the ER in order to enter the chloroplast raises thepossibility that this one isoform could actually be functional in twodistinct cellular compartments.

Several other features of the predicted ACCase primary structure warrantdiscussion. Two computer alignment programs (MACAW and ALIGN) were usedto search for regions of the ACCase amino acid sequences from rat,yeast, and C. cryptica that were similar. The MACAW program wasdeveloped by Schuler et al. (Schuler et al., Proteins Struct. Funct.Genet. 9: 180-190 (1991)) and the ALIGN program (Scientific andEducational Software, State Line, Pa.) is based on the method of Myersand Miller (Myenrs et al., CABIOS 4: 11-17 (1988)). Calculations for "%identity" used the ALIGN program with default penalties for mismatches,gap introductions, and gap elongation.

In the region of the C. cryptica ACCase polypeptide that includes thebiotin carboxylase domain (residues 1 to 620), there is 52% and 50%identity with the rat and yeast ACCase sequences, respectively.Likewise, the region of C. cryptica ACCase that includes thecarboxyltransferase domain (residues 1426 to 2089) exhibits 50% identitywith both the rat and yeast sequences. Therefore, considerablevariations can be made to the sequence while maintaining the biologicalactivity.

On the other hand, there is less sequence conservation in the middleregion of the protein among any of these ACCase enzymes (30% identity,with the bulk of this similarity occurring in the vicinity of the biotinbinding site). This relationship is graphically demonstrated by thehomology plots of FIG. 1. This middle region, which includes portions ofthe biotin carboxyl carrier protein domain, may be little more than aspacer region that facilitates the physical movement of the carboxylatedbiotin from the biotin carboxylase active site to thecarboxyltransferase active site. In this case, a high degree of sequenceconservation would not be expected.

Variants of ACCase may be constructed using the principal of maintaininga high degree of homology in the conserved regions and making any of alarge number of changes to the regions which are not conserved.

Unlike the multifunctional fatty acid synthase enzyme from animals andyeast (McCarthy et al., Trends Biochem. Sciences 9: 60-63 (1984)), thedomains of ACCases from animals, yeast, and C. cryptica are in the samerelative positions. This suggests either that an early, single genefusion event occurred in the course of evolution or that there is astrict, functional requirement for this particular arrangement.

The presumed biotin binding site is a lysine residue (No. 770) that isflanked by two methionines. This tripeptide has been observed in everybiotin-containing enzyme for which the amino acid sequence is known.Another characteristic of this region is the presence of one or moreproline residues approximately 25 to 30 positions upstream from thebiotin binding site that are believed to form a hinge region forcarboxybiotin movement (Samols et al., J. Biol. Chem. 263: 6461-6464(1988)). Proline residues are also found at this location in C. crypticaACCase, although they are displaced five to six residues toward theN-terminus in C. cryptica ACCase relative to yeast and animal ACCases.

Regions of the carboxyltransferase subunit from E. coli that areproposed to be involved in acetyl-CoA and carboxybiotin binding havebeen identified (Li et al., J. Biol. Chem. 267: 16841-16847, (1992)).Another highly conserved region is the putative ATP-binding site of thebiotin carboxylase domain/subunit. A comparison of the amino acidsequence in these areas of ACCase from C. cryptica, yeast, rat and E.coli is shown in FIG. 2. Accordingly, while the nucleotide sequence maybe changed significantly, careful selection of any variation in theamino acid sequence in these regions is needed. Additionally, changes inthese areas may be desirable for making changes in the enzyme's activityor properties.

The foregoing description of the specific embodiments reveal the generalnature of the invention so that others can, by applying currentknowledge, readily modify and/or adapt for various applications suchspecific embodiments without departing from the generic concept, and,therefore, such adaptations and modifications should and are intended tobe comprehended within the meaning and range of equivalents of thedisclosed embodiments. It is to be understood that the phraseology orterminology employed herein is for the purpose of description and not oflimitation.

All references mentioned in this application are incorporated byreference.

    __________________________________________________________________________    SEQUENCE LISTING                                                              (1) GENERAL INFORMATION:                                                      (iii) NUMBER OF SEQUENCES: 25                                                 (2) INFORMATION FOR SEQ ID NO:1:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 51 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                       GlyArgGlnValValValIleValAsnAspValThrValGlnSerGly                              151015                                                                        SerPheGlyValGluGluAspGluValPhePheLysAlaSerLysTyr                              202530                                                                        AlaArgGluAsnLysLeuProArgValTyrIleAlaCysAsnSerGly                              354045                                                                        AlaArgIle                                                                     50                                                                            (2) INFORMATION FOR SEQ ID NO:2:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 51 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                                       GlyArgGlnPheValValValAlaAsnAspIleThrPheLysIleGly                              151015                                                                        SerPheGlyProGlnGluAspGluPhePheAsnLysValThrGluTyr                              202530                                                                        AlaArgLysArgGlyIleProArgIleTyrLeuAlaAlaAsnSerGly                              354045                                                                        AlaArgIle                                                                     50                                                                            (2) INFORMATION FOR SEQ ID NO:3:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 51 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                                       GlyArgAspValIleValIleGlyAsnAspIleThrTyrArgIleGly                              151015                                                                        SerPheGlyProGlnGluAspLeuLeuPheLeuArgAlaSerGluLeu                              202530                                                                        AlaArgAlaGluGlyIleProArgIleTyrValAlaAlaAsnSerGly                              354045                                                                        AlaArgIle                                                                     50                                                                            (2) INFORMATION FOR SEQ ID NO:4:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 51 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                                       GlyMetProValValAlaAlaAlaPheGluPheAlaPheMetGlyGly                              151015                                                                        SerMetGlySerValValGlyAlaArgPheValArgAlaValGluGln                              202530                                                                        AlaLeuGluAspAsnCysProLeuIleCysPheSerAlaSerGlyGly                              354045                                                                        AlaArgMet                                                                     50                                                                            (2) INFORMATION FOR SEQ ID NO:5:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                                       GlyLysSerValValIleGlyArgGlyArgLeuGlyGlyIleProMet                              151015                                                                        GlyAlaIleAla                                                                  20                                                                            (2) INFORMATION FOR SEQ ID NO:6:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                                       AlaLysGlyValValValGlyArgAlaArgLeuGlyGlyIleProLeu                              151015                                                                        GlyValIleGly                                                                  20                                                                            (2) INFORMATION FOR SEQ ID NO:7:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                                       AlaGlnThrValValValGlyArgAlaArgLeuGlyGlyIleProVal                              151015                                                                        GlyValValAla                                                                  20                                                                            (2) INFORMATION FOR SEQ ID NO:8:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                                       AspLysAlaIleValGlyGlyIleAlaArgLeuAspGlyArgProVal                              151015                                                                        MetIleIleGly                                                                  20                                                                            (2) INFORMATION FOR SEQ ID NO:9:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:                                       GluAsnGlyIleMetIleLysAlaSerGluGlyGlyGlyGlyLysGly                              151015                                                                        IleArgPheValAsp                                                               20                                                                            (2) INFORMATION FOR SEQ ID NO:10:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:                                      GlyPheProValMetIleLysAlaSerGluGlyGlyGlyGlyLysGly                              151015                                                                        IleArgGlnValGlu                                                               20                                                                            (2) INFORMATION FOR SEQ ID NO:11:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:                                      GlyTyrProAsxMetIleLysAlaSerGluGlyGlyGlyGlyLysGly                              151015                                                                        IleArgLysAsxAsn                                                               20                                                                            (2) INFORMATION FOR SEQ ID NO:12:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 21 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:                                      GlyTyrProValIleIleLysAlaSerGlyGlyGlyGlyGlyArgGly                              151015                                                                        MetArgValValArg                                                               20                                                                            (2) INFORMATION FOR SEQ ID NO:13:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:                                      TTYGTNTGGAAYGARGCNGA20                                                        (2) INFORMATION FOR SEQ ID NO: 14:                                            (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 20 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:                                      ACNGCRTTNCCRTGYTGRTC20                                                        (2) INFORMATION FOR SEQ ID NO:15:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 35 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:                                      LeuArgAsnAlaPheValGlnValSerAsnGluValIleGlySerPro                              151015                                                                        IlePheLeuMetGlnLeuCysLysAsnAlaArgHisIleGluValGln                              202530                                                                        IleValGly                                                                     35                                                                            (2) INFORMATION FOR SEQ ID NO:16:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 35 amino acids                                                    (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: internal                                                   (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:                                      PheProAsnLeuPheArgGlnValGlnAlaGluValProGlySerPro                              151015                                                                        IlePheValMetArgLeuAlaLysGlnSerArgHisLeuGluValGln                              202530                                                                        IleLeuAla                                                                     35                                                                            (2) INFORMATION FOR SEQ ID NO:17:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:                                      TGTCCAATTTGCCCGAA17                                                           (2) INFORMATION FOR SEQ ID NO:18:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:                                      TAAAGTTGAGATGCCCT17                                                           (2) INFORMATION FOR SEQ ID NO:19:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:                                      CCAAACGGCATCAACCC17                                                           (2) INFORMATION FOR SEQ ID NO:20:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:                                      GTTGGCGTAGTTGTTCA17                                                           (2) INFORMATION FOR SEQ ID NO:21:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 17 bases                                                          (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:                                      GCATTTCCTCACGATAG17                                                           (2) INFORMATION FOR SEQ ID NO:22:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 6790 bases                                                        (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:                                      ATGGCTCTCCGTAGGGGCCTTTACGCTGCTGCAGCGACTGCCATCTTGGTCACGGCTTCA60                GTGACCGCTTTTGGTAAGTCTGCATTTGGATTGATGGTTAGCATTCCCCACGAGCAGCAT120               GTTGTGTTACGCGTTGTTGCGTAGTGTCAGTTGTGATAATTATGATCGACAAGAATGGGA180               GGACTCTTTTTGTATCGTTTGTAGAGTGTTACACTGGACCTTCGCCTAAACACGTTTGGA240               GGTCCTCACATCCGCGACGAGAGCTCCCACATTTCATCTACATCTCTACGTGAGCGAATT300               TACGTCACCTGGCTATTCATTTGAGGTCCCTTCCTCCCACGTGCTTCCATGTTCCTTAGG360               GCGCTTAAGCATAGTTGCACTTGGAGCACTTGTTGTCAAATTGTCGTGTACCCGTCACTT420               TCGAAGCGTTATTTGGGGTTGGCTGGTCCTATTTAAACAGAAATTATTACGATGTTTCGC480               TAACGATTCTTTCTCTCATTTTTTAACCTACACGAAACAGCTCCTCAGCATTCGACATTC540               ACCCCCCAATCGCTCTCGGCGGCACCCACGCGCAACGTCTTCGGCCAGATCAAAAGCGCC600               TTCTTCAACCATGATGTTGCCACCTCTCGAACCATTCTTCACGCCGCGACACTAGATGAA660               ACTGTTCTTTCCGCTTCAGACTCCGTCGCCAAATCTGTCGAAGACTACGTGAAATCCCGT720               GGTGGAAATCGCGTCATTCGTAAAGTCCTCATCGCCAACAACGGCATGGCCGCGACAAAG780               TCCATCCTCTCCATGCGTCAATGGGCCTACATGGAATTCGGGGACGAACGTGCCATCCAG840               TTCGTTGCGATGGCGACTCCCGAGGATTTGAAGGCGAACGCCGAATTTATTCGCTTGGCG900               GATTCTTTCGTCGAGGTACCGGGAGGAAAGAACTTGAACAACTACGCCAACGTCGATGTC960               ATTACCCGCATCGCTAAGGAGCAGGGGGTTGATGCCGTTTGGCCTGGATGGGGTCATGCA1020              TCTGAGAATCCGAAGCTCCCTAATGCGCTTGACAAATTGGGAATCAAGTTCATTGGACCA1080              ACTGGGCCTGTCATGAGCGTTTTGGGAGACAAGATTGCTGCGAACATTCTAGCACAGACA1140              GCGAAAGTCCCCTCCATTCCCTGGAGTGGATCCTTTGGTGGACCAGACGATGGACCCCTT1200              CAGGCGGATCTGACCGAGGAGGGTACTATCCCAATGGAAATCTTTAACAAGGGATTAGTA1260              ACCTCTGCTGATGAAGCCGTCATTGTGGCGAACAAGATTGGCTGGGAGAACGGAATCATG1320              ATCAAGGCTTCTGAGGGTGGAGGAGGAAAGGGTATACGCTTTGTCGACAATGAGGCCGAC1380              TTACGGAACGCGTTCGTTCAGGTGTCCAATGAAGTGATTGGCTCTCCTATTTTCCTCATG1440              CAGTTGTGTAAGAACGCTCGTCACATCGAAGTGCAAATTGTTGGCGACCAGCACGGAAAT1500              GCTGTAGCGTTGAACGGTCGAGATTGCTCCACTCAGCGTCGCTTCCAGAAGATCTTCGAG1560              GAAGGTCCTCCGTCCATTGTACCGAAAGAAACATTCCACGAGATGGAACTTGCGGCTCAA1620              CGGTTGACTCAAAACATTGGGTATCAAGGTGCTGGAACTGTGGAATACTTGTACAACGCC1680              GCTGACAATAAGTTTTTCTTCCTTGAGTTGAACCCCCGTCTCCAAGTGGAGCATCCTGTG1740              ACTGAAGGAATTACCGGCGCTAATCTTCCTGCCACTCAGCTTCAAGTTGCTATGGGTATT1800              CCTCTCTTCAACATTCCTGACATTCGCCGTCTCTATGGAAGAGAGGATGCTTACGGAACG1860              GATCCCATTGATTTTCTTCAAGAACGTTACCGCGAACTCGACTCTCATGTAATTGCTGCC1920              CGCATCACTGCTGAAAACCCCGATGAAGGATTCAAACCCACCTCAGGCTCAATTGAGCGA1980              ATCAAATTTCAATCCACCCCAAATGTTTGGGGATATTTCTCTGTTGGTGCTAACGGTGGA2040              ATCCATGAATTTGCCGACTCTCAGTTTGGCCATCTTTTCGCTAAGGGTCCGAACCGTGAG2100              CAAGCCCGCAAGGCATTGGTTTTGGCTCTTAAGGAGATGGAAGTGCGCGGAGACATTCGT2160              AACTCTGTTGAATACCTAGTCAAGTTGCTCGAAACTGAAGCTTTCAAGAAGAACACTATC2220              GACACGTCTTGGTTAGATGGCATTATTAAGGAGAAGTCCGTTAAAGTTGAGATGCCCTCT2280              CACTTAGTGGTTGTCGGAGCCGCTGTTTTCAAGGCCTTCGAACATGTTAAGGTGGCCACT2340              GAAGAAGTTAAGGAATCGTTTCGAAAAGGACAAGTCTCCACTGCAGGGATTCCAGGCATA2400              AACTCGTTCAACATCGAAGTTGCGTACTTAGACACGAAGTACCCATTCCACGTAGAACGG2460              ATCTCTCCAGATGTTTACAGGTTTACCTTGGACGGGAACACGATTGATGTGGAAGTTACC2520              CAAACCGCTGAAGGAGCACTTTTGGCAACCTTTGGAGGAGAGACTCATCGTATCTTTGGT2580              ATGGACGAACCACTTGGCCTTCGACTGTCATTGGACGGGGCAACTGTCCTAATGTAAGTT2640              GTCTGTCCCTCGATGTCGCTGTTTCATCTGTAGTCAAGTATCCTCACCTTATGTACTTAT2700              TCGTAGGCCAACAATTTTTGACCCCTCTGAACTCCGCACTGATGTGACTGGAAAGGTTGT2760              TCGTTACCTCCAAGACAATGGAGCAACTGTTGAAGCGGGCCAGCCCTATGTCGAGGTTGA2820              AGCGATGAAGATGATCATGCCAATCAAGGCTACTGAGTCTGGAAAAATTACTCACAACCT2880              AAGTGCTGGATCTGTAATCTCTGCTGGTGACCTTCTTGCTTCTCTCGAACTTAAGGATCC2940              CTCTAGGGTTAAGAAAATAGAAACTTTTTCGGGCAAATTGGACATTATGGAATCGAAGGT3000              TGACTTAGAACCGCAGAAAGCAGTCATGAATGTCCTCTCTGGGTTCAACTTAGACCCTGA3060              GGCAGTTGCGCAGCAAGCAATTGACAGTGCTACCGACAGCTCTGCCGCAGCCGATCTTCT3120              TGTCCAAGTATTAGACGAATTCTATCGCGTTGAATCTCAGTTTGATGGTGTCATCGCTGA3180              TGATGTTGTCCGCACTCTCACCAAAGCGAACACCGAGACACTTGATGTTGTCATCTCCGA3240              GAACTTGGCCCACCAGCAGCTCAAGAGGCGTAGTCAGCTTCTCCTCGCTATGATCCGTCA3300              ACTTGACACGTTTCAAGACAGATTTGGCAGAGAAGTTCCGGATGCTGTCATTGAAGCATT3360              GAGTAGGCTTTCTACCTTGAAAGACAAATCTTACGGTGAAATCATTCTTGCGGCTGAGGA3420              GAGAGTCCGCGAAGCCAAGGTGCCGTCCTTCGAAGTGCGTCGTGCTGATTTGCGTGCAAA3480              GCTTGCTGACCCGGAGACAGATTTGATTGACCTGAGTAAGAGCTCAACACTCTCAGCAGG3540              GGTTGACCTTCTCACAAATCTTTTTGATGACGAAGATGAATCTGTCCGCGCTGCTGCTAT3600              GGAAGTATATACTCGCCGTGTCTACCGTACCTACAACATCCCCGAGCTAACTGTTGGAGT3660              TGAGAATGGCCGCCTCTCATGTAGCTTCTCCTTCCAATTTGCTGATGTCCCGGCGAAAGA3720              CCGTGTCACCCGCCAAGGGTTCTTCTCAGTTATCGACGACGCTTCAAAGTTCGCGCAACA3780              GCTTCCTGAGATTCTCAACTCGTTTGGATCAAAGATCGCAGGGGATGCAAGCAAAGAAGG3840              CCCTGTCAATGTTTTGCAGGTTGGTGCTCTCTCGGGAGATATCAGTATTGAGGACCTCGA3900              GAAAGCTACTTCCGCTAACAAGGACAAGTTGAATATGCTTGGTGTCCGCACTGTGACGGC3960              TCTTATCCCAAGGGGAAAGAAGGACCCAAGCTATTATTCATTCCCCCAATGCAGTGGCTT4020              CAAGGAGGATCCTCTTCGCAGAGGCATGCGCCCAACCTTTCATCATCTCCTGGAACTCGG4080              ACGGCTGGAGGAAAACTTTGCTCTTGAACGAATTCCTGCAGTTGGACGCAACGTACAGAT4140              TTATGTTGGTTCCGAGAAGACGGCAAGGCGAAATGCAGCTCAAGTTGTTTTCTTGAGAGC4200              TATCTCACATACTCCTGGCCTAACTACCTTCTCTGGTGCACGCCGAGCTCTTCTCCAGGG4260              GCTTGACGAATTGGAACGTGCTCAAGCAAACTCAAAGGTCAGTGTCCAGTCATCGTCTCG4320              CATCTACCTTCACTCTCTCCCAGAACAGTCTGATGCAACTCCCGAGGAGATTGCTAAAGA4380              ATTCGAAGGTGTCATTGACAAGCTAAAGAGTCGATTGGCCCAACGTCTTACGAAACTGCG4440              TGTGGATGAGATTGAAACCAAGGTTCGCGTGACTGTCCAGGATGAAGACGGTAGTCCCAG4500              GGTTGTGCCTGTACGCCTTGTGGCTTCTTCAATGCAAGGCGAATGGCTTAAAACATCTGC4560              TTACATTGATCGTCCGGACCCGGTCACTGGAGTCACCCGTGAACGGTGCGTGATTGGAGA4620              AGGCATTGACGAGGTTTGTGAACTTGAGTCGTATGACTCTACCAGTACCATCCAAACAAA4680              GCGCTCAATTGCAAGACGTGTGGGATCTACCTACGCTTATGACTACCTTGGACTCCTTGA4740              GGTCAGCTTGCTTGGAGAATGGGATAAGTATCTCAGCAGTCTCTCAGGACCGGACACCCC4800              TACCATCCCGTCGAATGTTTTTGAAGCTCAAGAGTTACTTGAAGGACCTGATGGCGAGCT4860              TGTCACCGGGAAACGTGAAATTGGAACAAATAAGGTTGGTATGGTTGCATGGGTGGTAAC4920              AATGAAAACACCTGAATATCCTGAGGGTCGACAGGTTGTTGTAATTGTGAACGATGTCAC4980              TGTACAAAGTGGTTCATTTGGAGTTGAGGAGGATGAAGTTTTCTTCAAGGCCTCCAAATA5040              TGCTCGCGAAAATAAGCTCCCCCGTGTCTACATTGCGTGCAACTCTGGTGCTAGAATTGG5100              TTTGGTGGATGATCTCAAGCCAAAGTTCCAGATCAAATTCATTGATGAGGCGAGTCCATC5160              TAAGGGTTTTGAGTACCTTTATCTTGATGATGCAACGTACAAATCTCTTCCAGAAGGGTC5220              GGTAAATGTAAGGAAGGTCCCTGAAGGCTGGGCTATCACTGATATCATTGGAACGAACGA5280              AGGAATTGGGGTTGAGAACCTTCAAGGAAGTGGCAAAATTGCTGGCGAGACATCAAGGGC5340              ATATGATGAAATCTTCACCTTGAGTTACGTCACAGGTAGAAGTGTTGGTATTGGAGCTTA5400              CCTTGTCCGTCTCGGCCAGCGTATTATTCAGATGAAACAAGGACCCATGATTCTCACAGG5460              CTATGGTGCCCTGAATAAGCTTCTCGGCCGTGAAGTGTACAACTCAAACGACCAACTTGG5520              TGGTCCTCAAGTCATGTTCCCAAACGGCTGCTCTCATGAAATTGTAGATGATGACCAACA5580              AGGCATCCAGTCCATTATCCAATGGCTAAGCTTTGTTCCCAAGACAACTGATGCTGTGTC5640              ACCCGTCCGTGAATGTGCCGACCCTGTCAACAGGGATGTTCAATGGCGCCCTACCCCCAC5700              TCCTTATGATCCACGCCTCATGCTCTCAGGAACTGACGAGGAACTCGGTTTTTTTGACAC5760              AGGAAGCTGGAAGGAATATCTTGCTGGCTGGGGGAAGAGTGTTGTTATTGGCCACGGTCG5820              CCTTGGTGGCATTCCTATGGGTGCTATTGCCGTGGAGACCCGGCTTGTTGAGAAGATTAT5880              CCCTGCAGATCCAGCAGACCCCAACTCCCGCGAAGCTGTCATGCCCCAGGCTGGACAAGT5940              TCTTTTCCCTGACTCATCCTACAAGACAGCCCAAGCTCTCCGCGACTTTAATAACGAGGG6000              CCTCCCTGTGATGATTTTCGGCAACTGGCGTGGATTTAGTGGTGGAAGTCGTGACATGTC6060              TGGTGAAATCCTCAAATTTGGATCCATGATTGTCGATTCACTCCGAGAGTACAAACATCC6120              TATTTACATATACTTCCCTCCATATGGTGAACTTCGAGGAGGATCGTGGGTTGTGGTGGA6180              CCCCACTATCAATGAGGACAAGATGACCATGTTCTCAGATCCTGATGCTCGTGGTGGTAT6240              TCTCGAACCTGCTGGTATTGTAGAAATCAAGTTCCGCTTGGCAGACCAGCTGAAAGCCAT6300              GCACCGCATTGATCCCCAGCTGAAGATGCTAGATTCAGAGCTTGAGTCGACAGACGACAC6360              AGATGTCGCTGCTCAAGAAGCAATCAAAGAGCAGATTGCTGCAAGAGAGGAGCTTCTTAA6420              ACCCGTCTATCTTCAGGCTGCTACTGAATTTGCTGATCTCCACGACAAGACGGGACGGAT6480              GAAGGCGAAGGGTGTTATCAAAGAAGCAGTTCCATGGGCTCGCTCTCGTGAATACTTCTT6540              TTATCTTGCTAAGCGCCGCATTTTTCAAGACAACTATGTGTTGCAAATCACTGCTGCTGA6600              TCCTTCGTTAGACTCTAAGGCTGCTCTTGAGGTGTTGAAGAACATGTGCACTGCAGACTG6660              GGATGACAACAAAGCCGTTCTTGACTATTATCTGTCCAGCGATGGAGACATCACAGCCAA6720              GATTAGCGAGATGAAGAAGGCAGCTATCAAGGCACAGATCGAGCAGCTTCAGAAAGCTTT6780              GGAGGGTTGA6790                                                                (2) INFORMATION FOR SEQ ID NO:23:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 2089 amino acids                                                  (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:                                      MetAlaLeuArgArgGlyLeuTyrAlaAlaAlaAlaThrAlaIleLeu                              151015                                                                        ValThrAlaSerValThrAlaPheAlaProGlnHisSerThrPheThr                              202530                                                                        ProGlnSerLeuSerAlaAlaProThrArgAsnValPheGlyGlnIle                              354045                                                                        LysSerAlaPhePheAsnHisAspValAlaThrSerArgThrIleLeu                              505560                                                                        HisAlaAlaThrLeuAspGluThrValLeuSerAlaSerAspSerVal                              65707580                                                                      AlaLysSerValGluAspTyrValLysSerArgGlyGlyAsnArgVal                              859095                                                                        IleArgLysValLeuIleAlaAsnAsnGlyMetAlaAlaThrLysSer                              100105110                                                                     IleLeuSerMetArgGlnTrpAlaTyrMetGluPheGlyAspGluArg                              115120125                                                                     AlaIleGlnPheValAlaMetAlaThrProGluAspLeuLysAlaAsn                              130135140                                                                     AlaGluPheIleArgLeuAlaAspSerPheValGluValProGlyGly                              145150155160                                                                  LysAsnLeuAsnAsnTyrAlaAsnValAspValIleThrArgIleAla                              165170175                                                                     LysGluGlnGlyValAspAlaValTrpProGlyTrpGlyHisAlaSer                              180185190                                                                     GluAsnProLysLeuProAsnAlaLeuAspLysLeuGlyIleLysPhe                              195200205                                                                     IleGlyProThrGlyProValMetSerValLeuGlyAspLysIleAla                              210215220                                                                     AlaAsnIleLeuAlaGlnThrAlaLysValProSerIleProTrpSer                              225230235240                                                                  GlySerPheGlyGlyProAspAspGlyProLeuGlnAlaAspLeuThr                              245250255                                                                     GluGluGlyThrIleProMetGluIlePheAsnLysGlyLeuValThr                              260265270                                                                     SerAlaAspGluAlaValIleValAlaAsnLysIleGlyTrpGluAsn                              275280285                                                                     GlyIleMetIleLysAlaSerGluGlyGlyGlyGlyLysGlyIleArg                              290295300                                                                     PheValAspAsnGluAlaAspLeuArgAsnAlaPheValGlnValSer                              305310315320                                                                  AsnGluValIleGlySerProIlePheLeuMetGlnLeuCysLysAsn                              325330335                                                                     AlaArgHisIleGluValGlnIleValGlyAspGlnHisGlyAsnAla                              340345350                                                                     ValAlaLeuAsnGlyArgAspCysSerThrGlnArgArgPheGlnLys                              355360365                                                                     IlePheGluGluGlyProProSerIleValProLysGluThrPheHis                              370375380                                                                     GluMetGluLeuAlaAlaGlnArgLeuThrGlnAsnIleGlyTyrGln                              385390395400                                                                  GlyAlaGlyThrValGluTyrLeuTyrAsnAlaAlaAspAsnLysPhe                              405410415                                                                     PhePheLeuGluLeuAsnProArgLeuGlnValGluHisProValThr                              420425430                                                                     GluGlyIleThrGlyAlaAsnLeuProAlaThrGlnLeuGlnValAla                              435440445                                                                     MetGlyIleProLeuPheAsnIleProAspIleArgArgLeuTyrGly                              450455460                                                                     ArgGluAspAlaTyrGlyThrAspProIleAspPheLeuGlnGluArg                              465470475480                                                                  TyrArgGluLeuAspSerHisValIleAlaAlaArgIleThrAlaGlu                              485490495                                                                     AsnProAspGluGlyPheLysProThrSerGlySerIleGluArgIle                              500505510                                                                     LysPheGlnSerThrProAsnValTrpGlyTyrPheSerValGlyAla                              515520525                                                                     AsnGlyGlyIleHisGluPheAlaAspSerGlnPheGlyHisLeuPhe                              530535540                                                                     AlaLysGlyProAsnArgGluGlnAlaArgLysAlaLeuValLeuAla                              545550555560                                                                  LeuLysGluMetGluValArgGlyAspIleArgAsnSerValGluTyr                              565570575                                                                     LeuValLysLeuLeuGluThrGluAlaPheLysLysAsnThrIleAsp                              580585590                                                                     ThrSerTrpLeuAspGlyIleIleLysGluLysSerValLysValGlu                              595600605                                                                     MetProSerHisLeuValValValGlyAlaAlaValPheLysAlaPhe                              610615620                                                                     GluHisValLysValAlaThrGluGluValLysGluSerPheArgLys                              625630635640                                                                  GlyGlnValSerThrAlaGlyIleProGlyIleAsnSerPheAsnIle                              645650655                                                                     GluValAlaTyrLeuAspThrLysTyrProPheHisValGluArgIle                              660665670                                                                     SerProAspValTyrArgPheThrLeuAspGlyAsnThrIleAspVal                              675680685                                                                     GluValThrGlnThrAlaGluGlyAlaLeuLeuAlaThrPheGlyGly                              690695700                                                                     GluThrHisArgIlePheGlyMetAspGluProLeuGlyLeuArgLeu                              705710715720                                                                  SerLeuAspGlyAlaThrValLeuMetProThrIlePheAspProSer                              725730735                                                                     GluLeuArgThrAspValThrGlyLysValValArgTyrLeuGlnAsp                              740745750                                                                     AsnGlyAlaThrValGluAlaGlyGlnProTyrValGluValGluAla                              755760765                                                                     MetLysMetIleMetProIleLysAlaThrGluSerGlyLysIleThr                              770775780                                                                     HisAsnLeuSerAlaGlySerValIleSerAlaGlyAspLeuLeuAla                              785790795800                                                                  SerLeuGluLeuLysAspProSerArgValLysLysIleGluThrPhe                              805810815                                                                     SerGlyLysLeuAspIleMetGluSerLysValAspLeuGluProGln                              820825830                                                                     LysAlaValMetAsnValLeuSerGlyPheAsnLeuAspProGluAla                              835840845                                                                     ValAlaGlnGlnAlaIleAspSerAlaThrAspSerSerAlaAlaAla                              850855860                                                                     AspLeuLeuValGlnValLeuAspGluPheTyrArgValGluSerGln                              865870875880                                                                  PheAspGlyValIleAlaAspAspValValArgThrLeuThrLysAla                              885890895                                                                     AsnThrGluThrLeuAspValValIleSerGluAsnLeuAlaHisGln                              900905910                                                                     GlnLeuLysArgArgSerGlnLeuLeuLeuAlaMetIleArgGlnLeu                              915920925                                                                     AspThrPheGlnAspArgPheGlyArgGluValProAspAlaValIle                              930935940                                                                     GluAlaLeuSerArgLeuSerThrLeuLysAspLysSerTyrGlyGlu                              945950955960                                                                  IleIleLeuAlaAlaGluGluArgValArgGluAlaLysValProSer                              965970975                                                                     PheGluValArgArgAlaAspLeuArgAlaLysLeuAlaAspProGlu                              980985990                                                                     ThrAspLeuIleAspLeuSerLysSerSerThrLeuSerAlaGlyVal                              99510001005                                                                   AspLeuLeuThrAsnLeuPheAspAspGluAspGluSerValArgAla                              101010151020                                                                  AlaAlaMetGluValTyrThrArgArgValTyrArgThrTyrAsnIle                              1025103010351040                                                              ProGluLeuThrValGlyValGluAsnGlyArgLeuSerCysSerPhe                              104510501055                                                                  SerPheGlnPheAlaAspValProAlaLysAspArgValThrArgGln                              106010651070                                                                  GlyPhePheSerValIleAspAspAlaSerLysPheAlaGlnGlnLeu                              107510801085                                                                  ProGluIleLeuAsnSerPheGlySerLysIleAlaGlyAspAlaSer                              109010951100                                                                  LysGluGlyProValAsnValLeuGlnValGlyAlaLeuSerGlyAsp                              1105111011151120                                                              IleSerIleGluAspLeuGluLysAlaThrSerAlaAsnLysAspLys                              112511301135                                                                  LeuAsnMetLeuGlyValArgThrValThrAlaLeuIleProArgGly                              114011451150                                                                  LysLysAspProSerTyrTyrSerPheProGlnCysSerGlyPheLys                              115511601165                                                                  GluAspProLeuArgArgGlyMetArgProThrPheHisHisLeuLeu                              117011751180                                                                  GluLeuGlyArgLeuGluGluAsnPheAlaLeuGluArgIleProAla                              1185119011951200                                                              ValGlyArgAsnValGlnIleTyrValGlySerGluLysThrAlaArg                              120512101215                                                                  ArgAsnAlaAlaGlnValValPheLeuArgAlaIleSerHisThrPro                              122012251230                                                                  GlyLeuThrThrPheSerGlyAlaArgArgAlaLeuLeuGlnGlyLeu                              123512401245                                                                  AspGluLeuGluArgAlaGlnAlaAsnSerLysValSerValGlnSer                              125012551260                                                                  SerSerArgIleTyrLeuHisSerLeuProGluGlnSerAspAlaThr                              1265127012751280                                                              ProGluGluIleAlaLysGluPheGluGlyValIleAspLysLeuLys                              128512901295                                                                  SerArgLeuAlaGlnArgLeuThrLysLeuArgValAspGluIleGlu                              130013051310                                                                  ThrLysValArgValThrValGlnAspGluAspGlySerProArgVal                              131513201325                                                                  ValProValArgLeuValAlaSerSerMetGlnGlyGluTrpLeuLys                              133013351340                                                                  ThrSerAlaTyrIleAspArgProAspProValThrGlyValThrArg                              1345135013551360                                                              GluArgCysValIleGlyGluGlyIleAspGluValCysGluLeuGlu                              136513701375                                                                  SerTyrAspSerThrSerThrIleGlnThrLysArgSerIleAlaArg                              138013851390                                                                  ArgValGlySerThrTyrAlaTyrAspTyrLeuGlyLeuLeuGluVal                              139514001405                                                                  SerLeuLeuGlyGluTrpAspLysTyrLeuSerSerLeuSerGlyPro                              141014151420                                                                  AspThrProThrIleProSerAsnValPheGluAlaGlnGluLeuLeu                              1425143014351440                                                              GluGlyProAspGlyGluLeuValThrGlyLysArgGluIleGlyThr                              144514501455                                                                  AsnLysValGlyMetValAlaTrpValValThrMetLysThrProGlu                              146014651470                                                                  TyrProGluGlyArgGlnValValValIleValAsnAspValThrVal                              147514801485                                                                  GlnSerGlySerPheGlyValGluGluAspGluValPhePheLysAla                              149014951500                                                                  SerLysTyrAlaArgGluAsnLysLeuProArgValTyrIleAlaCys                              1505151015151520                                                              AsnSerGlyAlaArgIleGlyLeuValAspAspLeuLysProLysPhe                              152515301535                                                                  GlnIleLysPheIleAspGluAlaSerProSerLysGlyPheGluTyr                              154015451550                                                                  LeuTyrLeuAspAspAlaThrTyrLysSerLeuProGluGlySerVal                              155515601565                                                                  AsnValArgLysValProGluGlyTrpAlaIleThrAspIleIleGly                              157015751580                                                                  ThrAsnGluGlyIleGlyValGluAsnLeuGlnGlySerGlyLysIle                              1585159015951600                                                              AlaGlyGluThrSerArgAlaTyrAspGluIlePheThrLeuSerTyr                              160516101615                                                                  ValThrGlyArgSerValGlyIleGlyAlaTyrLeuValArgLeuGly                              162016251630                                                                  GlnArgIleIleGlnMetLysGlnGlyProMetIleLeuThrGlyTyr                              163516401645                                                                  GlyAlaLeuAsnLysLeuLeuGlyArgGluValTyrAsnSerAsnAsp                              165016551660                                                                  GlnLeuGlyGlyProGlnValMetPheProAsnGlyCysSerHisGlu                              1665167016751680                                                              IleValAspAspAspGlnGlnGlyIleGlnSerIleIleGlnTrpLeu                              168516901695                                                                  SerPheValProLysThrThrAspAlaValSerProValArgGluCys                              170017051710                                                                  AlaAspProValAsnArgAspValGlnTrpArgProThrProThrPro                              171517201725                                                                  TyrAspProArgLeuMetLeuSerGlyThrAspGluGluLeuGlyPhe                              173017351740                                                                  PheAspThrGlySerTrpLysGluTyrLeuAlaGlyTrpGlyLysSer                              1745175017551760                                                              ValValIleGlyArgGlyArgLeuGlyGlyIleProMetGlyAlaIle                              176517701775                                                                  AlaValGluThrArgLeuValGluLysIleIleProAlaAspProAla                              178017851790                                                                  AspProAsnSerArgGluAlaValMetProGlnAlaGlyGlnValLeu                              179518001805                                                                  PheProAspSerSerTyrLysThrAlaGlnAlaLeuArgAspPheAsn                              181018151820                                                                  AsnGluGlyLeuProValMetIlePheAlaAsnTrpArgGlyPheSer                              1825183018351840                                                              GlyGlySerArgAspMetSerGlyGluIleLeuLysPheGlySerMet                              184518501855                                                                  IleValAspSerLeuArgGluTyrLysHisProIleTyrIleTyrPhe                              186018651870                                                                  ProProTyrGlyGluLeuArgGlyGlySerTrpValValValAspPro                              187518801885                                                                  ThrIleAsnGluAspLysMetThrMetPheSerAspProAspAlaArg                              189018951900                                                                  GlyGlyIleLeuGluProAlaGlyIleValGluIleLysPheArgLeu                              1905191019151920                                                              AlaAspGlnLeuLysAlaMetHisArgIleAspProGlnLeuLysMet                              192519301935                                                                  LeuAspSerGluLeuGluSerThrAspAspThrAspValAlaAlaGln                              194019451950                                                                  GluAlaIleLysGluGlnIleAlaAlaArgGluGluLeuLeuLysPro                              195519601965                                                                  ValTyrLeuGlnAlaAlaThrGluPheAlaAspLeuHisAspLysThr                              197019751980                                                                  GlyArgMetLysAlaLysGlyValIleLysGluAlaValProTrpAla                              1985199019952000                                                              ArgSerArgGluTyrPhePheTyrLeuAlaLysArgArgIlePheGln                              200520102015                                                                  AspAsnTyrValLeuGlnIleThrAlaAlaAspProSerLeuAspSer                              202020252030                                                                  LysAlaAlaLeuGluValLeuLysAsnMetCysThrAlaAspTrpAsp                              203520402045                                                                  AspAsnLysAlaValLeuAspTyrTyrLeuSerSerAspGlyAspIle                              205020552060                                                                  ThrAlaLysIleSerGluMetLysLysAlaAlaIleLysAlaGlnIle                              2065207020752080                                                              GluGlnLeuGlnLysAlaLeuGluGly                                                   2085                                                                          (2) INFORMATION FOR SEQ ID NO:24:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 2089 amino acids                                                  (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: protein                                                   (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (v) FRAGMENT TYPE: N-terminal                                                 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:                                      MetAlaLeuArgArgGlyLeuTyrAlaAlaAlaAlaThrAlaIleLeu                              151015                                                                        ValThrAlaSerValThrAlaPheAlaProGlnHisSerThrPheThr                              202530                                                                        ProGlnSerLeuSerAlaAlaProThrArgAsnValPheGlyGlnIle                              354045                                                                        LysSerAlaPhePheAsnHisAspValAlaThrSerArgThrIleLeu                              505560                                                                        HisAlaAlaThrLeuAspGluThrValLeuSerAlaSerAspSerVal                              65707580                                                                      AlaLysSerValGluAspTyrValLysSerArgGlyGlyAsnArgVal                              859095                                                                        IleArgLysValLeuIleAlaAsnAsnGlyMetAlaAlaThrLysSer                              100105110                                                                     IleLeuSerMetArgGlnTrpAlaTyrMetGluPheGlyAspGluArg                              115120125                                                                     AlaIleGlnPheValAlaMetAlaThrProGluAspLeuLysAlaAsn                              130135140                                                                     AlaGluPheIleArgLeuAlaAspSerPheValGluValProGlyGly                              145150155160                                                                  LysAsnLeuAsnAsnTyrAlaAsnValAspValIleThrArgIleAla                              165170175                                                                     LysGluGlnGlyValAspAlaValTrpProGlyTrpGlyHisAlaSer                              180185190                                                                     GluAsnProLysLeuProAsnAlaLeuAspLysLeuGlyIleLysPhe                              195200205                                                                     IleGlyProThrGlyProValMetSerValLeuGlyAspLysIleAla                              210215220                                                                     AlaAsnIleLeuAlaGlnThrAlaLysValProSerIleProTrpSer                              225230235240                                                                  GlySerPheGlyGlyProAspAspGlyProLeuGlnAlaAspLeuThr                              245250255                                                                     GluGluGlyThrIleProMetGluIlePheAsnLysGlyLeuValThr                              260265270                                                                     SerAlaAspGluAlaValIleValAlaAsnLysIleGlyTrpGluAsn                              275280285                                                                     GlyIleMetIleLysAlaSerGluGlyGlyGlyGlyLysGlyIleArg                              290295300                                                                     PheValAspAsnGluAlaAspLeuArgAsnAlaPheValGlnValSer                              305310315320                                                                  AsnGluValIleGlySerProIlePheLeuMetGlnLeuCysLysAsn                              325330335                                                                     AlaArgHisIleGluValGlnIleValGlyAspGlnHisGlyAsnAla                              340345350                                                                     ValAlaLeuAsnGlyArgAspCysSerThrGlnArgArgPheGlnLys                              355360365                                                                     IlePheGluGluGlyProProSerIleValProLysGluThrPheHis                              370375380                                                                     GluMetGluLeuAlaAlaGlnArgLeuThrGlnAsnIleGlyTyrGln                              385390395400                                                                  GlyAlaGlyThrValGluTyrLeuTyrAsnAlaAlaAspAsnLysPhe                              405410415                                                                     PhePheLeuGluLeuAsnProArgLeuGlnValGluHisProValThr                              420425430                                                                     GluGlyIleThrGlyAlaAsnLeuProAlaThrGlnLeuGlnValAla                              435440445                                                                     MetGlyIleProLeuPheAsnIleProAspIleArgArgLeuTyrGly                              450455460                                                                     ArgGluAspAlaTyrGlyThrAspProIleAspPheLeuGlnGluArg                              465470475480                                                                  TyrArgGluLeuAspSerHisValIleAlaAlaArgIleThrAlaGlu                              485490495                                                                     AsnProAspGluGlyPheLysProThrSerGlySerIleGluArgIle                              500505510                                                                     LysPheGlnSerThrProAsnValTrpGlyTyrPheSerValGlyAla                              515520525                                                                     AsnGlyGlyIleHisGluPheAlaAspSerGlnPheGlyHisLeuPhe                              530535540                                                                     AlaLysGlyProAsnArgGluGlnAlaArgLysAlaLeuValLeuAla                              545550555560                                                                  LeuLysGluMetGluValArgGlyAspIleArgAsnSerValGluTyr                              565570575                                                                     LeuValLysLeuLeuGluThrGluAlaPheLysLysAsnThrIleAsp                              580585590                                                                     ThrSerTrpLeuAspGlyIleIleLysGluLysSerValLysValGlu                              595600605                                                                     MetProSerHisLeuValValValGlyAlaAlaValPheLysAlaPhe                              610615620                                                                     GluHisValLysValAlaThrGluGluValLysGluSerPheArgLys                              625630635640                                                                  GlyGlnValSerThrAlaGlyIleProGlyIleAsnSerPheAsnIle                              645650655                                                                     GluValAlaTyrLeuAspThrLysTyrProPheHisValGluArgIle                              660665670                                                                     SerProAspValTyrArgPheThrLeuAspGlyAsnThrIleAspVal                              675680685                                                                     GluValThrGlnThrAlaGluGlyAlaLeuLeuAlaThrPheGlyGly                              690695700                                                                     GluThrHisArgIlePheGlyMetAspGluProLeuGlyLeuArgLeu                              705710715720                                                                  SerLeuAspGlyAlaThrValLeuMetProThrIlePheAspProSer                              725730735                                                                     GluLeuArgThrAspValThrGlyLysValValArgTyrLeuGlnAsp                              740745750                                                                     AsnGlyAlaThrValGluAlaGlyGlnProTyrValGluValGluAla                              755760765                                                                     MetLysMetIleMetProIleLysAlaThrGluSerGlyLysIleThr                              770775780                                                                     HisAsnLeuSerAlaGlySerValIleSerAlaGlyAspLeuLeuAla                              785790795800                                                                  SerLeuGluLeuLysAspProSerArgValLysLysIleGluThrPhe                              805810815                                                                     SerGlyLysLeuAspIleMetGluSerLysValAspLeuGluProGln                              820825830                                                                     LysAlaValMetAsnValLeuSerGlyPheAsnLeuAspProGluAla                              835840845                                                                     ValAlaGlnGlnAlaIleAspSerAlaThrAspSerSerAlaAlaAla                              850855860                                                                     AspLeuLeuValGlnValLeuAspGluPheTyrArgValGluSerGln                              865870875880                                                                  PheAspGlyValIleAlaAspAspValValArgThrLeuThrLysAla                              885890895                                                                     AsnThrGluThrLeuAspValValIleSerGluAsnLeuAlaHisGln                              900905910                                                                     GlnLeuLysArgArgSerGlnLeuLeuLeuAlaMetIleArgGlnLeu                              915920925                                                                     AspThrPheGlnAspArgPheGlyArgGluValProAspAlaValIle                              930935940                                                                     GluAlaLeuSerArgLeuSerThrLeuLysAspLysSerTyrGlyGlu                              945950955960                                                                  IleIleLeuAlaAlaGluGluArgValArgGluAlaLysValProSer                              965970975                                                                     PheGluValArgArgAlaAspLeuArgAlaLysLeuAlaAspProGlu                              980985990                                                                     ThrAspLeuIleAspLeuSerLysSerSerThrLeuSerAlaGlyVal                              99510001005                                                                   AspLeuLeuThrAsnLeuPheAspAspGluAspGluSerValArgAla                              101010151020                                                                  AlaAlaMetGluValTyrThrArgArgValTyrArgThrTyrAsnIle                              1025103010351040                                                              ProGluLeuThrValGlyValGluAsnGlyArgLeuSerCysSerPhe                              104510501055                                                                  SerPheGlnPheAlaAspValProAlaLysAspArgValThrArgGln                              106010651070                                                                  GlyPhePheSerValIleAspAspAlaSerLysPheAlaGlnGlnLeu                              107510801085                                                                  ProGluIleLeuAsnSerPheGlySerLysIleAlaGlyAspAlaSer                              109010951100                                                                  LysGluGlyProValAsnValLeuGlnValGlyAlaLeuSerGlyAsp                              1105111011151120                                                              IleSerIleGluAspLeuGluLysAlaThrSerAlaAsnLysAspLys                              112511301135                                                                  LeuAsnMetLeuGlyValArgThrValThrAlaLeuIleProArgGly                              114011451150                                                                  LysLysAspProSerTyrTyrSerPheProGlnCysSerGlyPheLys                              115511601165                                                                  GluAspProLeuArgArgGlyMetArgProThrPheHisHisLeuLeu                              117011751180                                                                  GluLeuGlyArgLeuGluGluAsnPheAlaLeuGluArgIleProAla                              1185119011951200                                                              ValGlyArgAsnValGlnIleTyrValGlySerGluLysThrAlaArg                              120512101215                                                                  ArgAsnAlaAlaGlnValValPheLeuArgAlaIleSerHisThrPro                              122012251230                                                                  GlyLeuThrThrPheSerGlyAlaArgArgAlaLeuLeuGlnGlyLeu                              123512401245                                                                  AspGluLeuGluArgAlaGlnAlaAsnSerLysValSerValGlnSer                              125012551260                                                                  SerSerArgIleTyrLeuHisSerLeuProGluGlnSerAspAlaThr                              1265127012751280                                                              ProGluGluIleAlaLysGluPheGluGlyValIleAspLysLeuLys                              128512901295                                                                  SerArgLeuAlaGlnArgLeuThrLysLeuArgValAspGluIleGlu                              130013051310                                                                  ThrLysValArgValThrValGlnAspGluAspGlySerProArgVal                              131513201325                                                                  ValProValArgLeuValAlaSerSerMetGlnGlyGluTrpLeuLys                              133013351340                                                                  ThrSerAlaTyrIleAspArgProAspProValThrGlyValThrArg                              1345135013551360                                                              GluArgCysValIleGlyGluGlyIleAspGluValCysGluLeuGlu                              136513701375                                                                  SerTyrAspSerThrSerThrIleGlnThrLysArgSerIleAlaArg                              138013851390                                                                  ArgValGlySerThrTyrAlaTyrAspTyrLeuGlyLeuLeuGluVal                              139514001405                                                                  SerLeuLeuGlyGluTrpAspLysTyrLeuSerSerLeuSerGlyPro                              141014151420                                                                  AspThrProThrIleProSerAsnValPheGluAlaGlnGluLeuLeu                              1425143014351440                                                              GluGlyProAspGlyGluLeuValThrGlyLysArgGluIleGlyThr                              144514501455                                                                  AsnLysValGlyMetValAlaTrpValValThrMetLysThrProGlu                              146014651470                                                                  TyrProGluGlyArgGlnValValValIleValAsnAspValThrVal                              147514801485                                                                  GlnSerGlySerPheGlyValGluGluAspGluValPhePheLysAla                              149014951500                                                                  SerLysTyrAlaArgGluAsnLysLeuProArgValTyrIleAlaCys                              1505151015151520                                                              AsnSerGlyAlaArgIleGlyLeuValAspAspLeuLysProLysPhe                              152515301535                                                                  GlnIleLysPheIleAspGluAlaSerProSerLysGlyPheGluTyr                              154015451550                                                                  LeuTyrLeuAspAspAlaThrTyrLysSerLeuProGluGlySerVal                              155515601565                                                                  AsnValArgLysValProGluGlyTrpAlaIleThrAspIleIleGly                              157015751580                                                                  ThrAsnGluGlyIleGlyValGluAsnLeuGlnGlySerGlyLysIle                              1585159015951600                                                              AlaGlyGluThrSerArgAlaTyrAspGluIlePheThrLeuSerTyr                              160516101615                                                                  ValThrGlyArgSerValGlyIleGlyAlaTyrLeuValArgLeuGly                              162016251630                                                                  GlnArgIleIleGlnMetLysGlnGlyProMetIleLeuThrGlyTyr                              163516401645                                                                  GlyAlaLeuAsnLysLeuLeuGlyArgGluValTyrAsnSerAsnAsp                              165016551660                                                                  GlnLeuGlyGlyProGlnValMetPheProAsnGlyCysSerHisGlu                              1665167016751680                                                              IleValAspAspAspGlnGlnGlyIleGlnSerIleIleGlnTrpLeu                              168516901695                                                                  SerPheValProLysThrThrAspAlaValSerProValArgGluCys                              170017051710                                                                  AlaAspProValAsnArgAspValGlnTrpArgProThrProThrPro                              171517201725                                                                  TyrAspProArgLeuMetLeuSerGlyThrAspGluGluLeuGlyPhe                              173017351740                                                                  PheAspThrGlySerTrpLysGluTyrLeuAlaGlyTrpGlyLysSer                              1745175017551760                                                              ValValIleGlyArgGlyArgLeuGlyGlyIleProMetGlyAlaIle                              176517701775                                                                  AlaValGluThrArgLeuValGluLysIleIleProAlaAspProAla                              178017851790                                                                  AspProAsnSerArgGluAlaValMetProGlnAlaGlyGlnValLeu                              179518001805                                                                  PheProAspSerSerTyrLysThrAlaGlnAlaLeuArgAspPheAsn                              181018151820                                                                  AsnGluGlyLeuProValMetIlePheAlaAsnTrpArgGlyPheSer                              1825183018351840                                                              GlyGlySerArgAspMetSerGlyGluIleLeuLysPheGlySerMet                              184518501855                                                                  IleValAspSerLeuArgGluTyrLysHisProIleTyrIleTyrPhe                              186018651870                                                                  ProProTyrGlyGluLeuArgGlyGlySerTrpValValValAspPro                              187518801885                                                                  ThrIleAsnGluAspLysMetThrMetPheSerAspProAspAlaArg                              189018951900                                                                  GlyGlyIleLeuGluProAlaGlyIleValGluIleLysPheArgLeu                              1905191019151920                                                              AlaAspGlnLeuLysAlaMetHisArgIleAspProGlnLeuLysMet                              192519301935                                                                  LeuAspSerGluLeuGluSerThrAspAspThrAspValAlaAlaGln                              194019451950                                                                  GluAlaIleLysGluGlnIleAlaAlaArgGluGluLeuLeuLysPro                              195519601965                                                                  ValTyrLeuGlnAlaAlaThrGluPheAlaAspLeuHisAspLysThr                              197019751980                                                                  GlyArgMetLysAlaLysGlyValIleLysGluAlaValProTrpAla                              1985199019952000                                                              ArgSerArgGluTyrPhePheTyrLeuAlaLysArgArgIlePheGln                              200520102015                                                                  AspAsnTyrValLeuGlnIleThrAlaAlaAspProSerLeuAspSer                              202020252030                                                                  LysAlaAlaLeuGluValLeuLysAsnMetCysThrAlaAspTrpAsp                              203520402045                                                                  AspAsnLysAlaValLeuAspTyrTyrLeuSerSerAspGlyAspIle                              205020552060                                                                  ThrAlaLysIleSerGluMetLysLysAlaAlaIleLysAlaGlnIle                              2065207020752080                                                              GluGlnLeuGlnLysAlaLeuGluGly                                                   2085                                                                          (2) INFORMATION FOR SEQ ID NO:25:                                             (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 6270 bases                                                        (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA                                                       (iii) HYPOTHETICAL: NO                                                        (iv) ANTI-SENSE: NO                                                           (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:                                      ATGGCTCTCCGTAGGGGCCTTTACGCTGCTGCAGCGACTGCCATCTTGGTCACGGCTTC60                 GTGACCGCTTTTGCTCCTCAGCATTCGACATTCACCCCCCAATCGCTCTCGGCGGCAC120                 ACGCGCAACGTCTTCGGCCAGATCAAAAGCGCCTTCTTCAACCATGATGTTGCCACCT180                 CGAACCATTCTTCACGCCGCGACACTAGATGAAACTGTTCTTTCCGCTTCAGACTCCG240                 GCCAAATCTGTCGAAGACTACGTGAAATCCCGTGGTGGAAATCGCGTCATTCGTAAAG300                 CTCATCGCCAACAACGGCATGGCCGCGACAAAGTCCATCCTCTCCATGCGTCAATGGG360                 TACATGGAATTCGGGGACGAACGTGCCATCCAGTTCGTTGCGATGGCGACTCCCGAGG420                 TTGAAGGCGAACGCCGAATTTATTCGCTTGGCGGATTCTTTCGTCGAGGTACCGGGAG480                 AAGAACTTGAACAACTACGCCAACGTCGATGTCATTACCCGCATCGCTAAGGAGCAGG540                 GTTGATGCCGTTTGGCCTGGATGGGGTCATGCATCTGAGAATCCGAAGCTCCCTAATG600                 CTTGACAAATTGGGAATCAAGTTCATTGGACCAACTGGGCCTGTCATGAGCGTTTTGG660                 GACAAGATTGCTGCGAACATTCTAGCACAGACAGCGAAAGTCCCCTCCATTCCCTGGA720                 GGATCCTTTGGTGGACCAGACGATGGACCCCTTCAGGCGGATCTGACCGAGGAGGGTA780                 ATCCCAATGGAAATCTTTAACAAGGGATTAGTAACCTCTGCTGATGAAGCCGTCATTG840                 GCGAACAAGATTGGCTGGGAGAACGGAATCATGATCAAGGCTTCTGAGGGTGGAGGAG900                 AAGGGTATACGCTTTGTCGACAATGAGGCCGACTTACGGAACGCGTTCGTTCAGGTGT960                 AATGAAGTGATTGGCTCTCCTATTTTCCTCATGCAGTTGTGTAAGAACGCTCGTCAC1020                 GAAGTGCAAATTGTTGGCGACCAGCACGGAAATGCTGTAGCGTTGAACGGTCGAGAT1080                 TCCACTCAGCGTCGCTTCCAGAAGATCTTCGAGGAAGGTCCTCCGTCCATTGTACCG1140                 GAAACATTCCACGAGATGGAACTTGCGGCTCAACGGTTGACTCAAAACATTGGGTAT1200                 GGTGCTGGAACTGTGGAATACTTGTACAACGCCGCTGACAATAAGTTTTTCTTCCTT1260                 TTGAACCCCCGTCTCCAAGTGGAGCATCCTGTGACTGAAGGAATTACCGGCGCTAAT1320                 CCTGCCACTCAGCTTCAAGTTGCTATGGGTATTCCTCTCTTCAACATTCCTGACATT1380                 CGTCTCTATGGAAGAGAGGATGCTTACGGAACGGATCCCATTGATTTTCTTCAAGAA1440                 TACCGCGAACTCGACTCTCATGTAATTGCTGCCCGCATCACTGCTGAAAACCCCGAT1500                 GGATTCAAACCCACCTCAGGCTCAATTGAGCGAATCAAATTTCAATCCACCCCAAAT1560                 TGGGGATATTTCTCTGTTGGTGCTAACGGTGGAATCCATGAATTTGCCGACTCTCAG1620                 GGCCATCTTTTCGCTAAGGGTCCGAACCGTGAGCAAGCCCGCAAGGCATTGGTTTTG1680                 CTTAAGGAGATGGAAGTGCGCGGAGACATTCGTAACTCTGTTGAATACCTAGTCAAG1740                 CTCGAAACTGAAGCTTTCAAGAAGAACACTATCGACACGTCTTGGTTAGATGGCATT1800                 AAGGAGAAGTCCGTTAAAGTTGAGATGCCCTCTCACTTAGTGGTTGTCGGAGCCGCT1860                 TTCAAGGCCTTCGAACATGTTAAGGTGGCCACTGAAGAAGTTAAGGAATCGTTTCGA1920                 GGACAAGTCTCCACTGCAGGGATTCCAGGCATAAACTCGTTCAACATCGAAGTTGCG1980                 TTAGACACGAAGTACCCATTCCACGTAGAACGGATCTCTCCAGATGTTTACAGGTTT2040                 TTGGACGGGAACACGATTGATGTGGAAGTTACCCAAACCGCTGAAGGAGCACTTTTG2100                 ACCTTTGGAGGAGAGACTCATCGTATCTTTGGTATGGACGAACCACTTGGCCTTCGA2160                 TCATTGGACGGGGCAACTGTCCTAATGCCAACAATTTTTGACCCCTCTGAACTCCGC2220                 GATGTGACTGGAAAGGTTGTTCGTTACCTCCAAGACAATGGAGCAACTGTTGAAGCG2280                 CAGCCCTATGTCGAGGTTGAAGCGATGAAGATGATCATGCCAATCAAGGCTACTGAG2340                 GGAAAAATTACTCACAACCTAAGTGCTGGATCTGTAATCTCTGCTGGTGACCTTCTT2400                 TCTCTCGAACTTAAGGATCCCTCTAGGGTTAAGAAAATAGAAACTTTTTCGGGCAAA2460                 GACATTATGGAATCGAAGGTTGACTTAGAACCGCAGAAAGCAGTCATGAATGTCCTC2520                 GGGTTCAACTTAGACCCTGAGGCAGTTGCGCAGCAAGCAATTGACAGTGCTACCGAC2580                 TCTGCCGCAGCCGATCTTCTTGTCCAAGTATTAGACGAATTCTATCGCGTTGAATCT2640                 TTTGATGGTGTCATCGCTGATGATGTTGTCCGCACTCTCACCAAAGCGAACACCGAG2700                 CTTGATGTTGTCATCTCCGAGAACTTGGCCCACCAGCAGCTCAAGAGGCGTAGTCAG2760                 CTCCTCGCTATGATCCGTCAACTTGACACGTTTCAAGACAGATTTGGCAGAGAAGTT2820                 GATGCTGTCATTGAAGCATTGAGTAGGCTTTCTACCTTGAAAGACAAATCTTACGGT2880                 ATCATTCTTGCGGCTGAGGAGAGAGTCCGCGAAGCCAAGGTGCCGTCCTTCGAAGTG2940                 CGTGCTGATTTGCGTGCAAAGCTTGCTGACCCGGAGACAGATTTGATTGACCTGAGT3000                 AGCTCAACACTCTCAGCAGGGGTTGACCTTCTCACAAATCTTTTTGATGACGAAGAT3060                 TCTGTCCGCGCTGCTGCTATGGAAGTATATACTCGCCGTGTCTACCGTACCTACAAC3120                 CCCGAGCTAACTGTTGGAGTTGAGAATGGCCGCCTCTCATGTAGCTTCTCCTTCCAA3180                 GCTGATGTCCCGGCGAAAGACCGTGTCACCCGCCAAGGGTTCTTCTCAGTTATCGAC3240                 GCTTCAAAGTTCGCGCAACAGCTTCCTGAGATTCTCAACTCGTTTGGATCAAAGATC3300                 GGGGATGCAAGCAAAGAAGGCCCTGTCAATGTTTTGCAGGTTGGTGCTCTCTCGGGA3360                 ATCAGTATTGAGGACCTCGAGAAAGCTACTTCCGCTAACAAGGACAAGTTGAATATG3420                 GGTGTCCGCACTGTGACGGCTCTTATCCCAAGGGGAAAGAAGGACCCAAGCTATTAT3480                 TTCCCCCAATGCAGTGGCTTCAAGGAGGATCCTCTTCGCAGAGGCATGCGCCCAACC3540                 CATCATCTCCTGGAACTCGGACGGCTGGAGGAAAACTTTGCTCTTGAACGAATTCCT3600                 GTTGGACGCAACGTACAGATTTATGTTGGTTCCGAGAAGACGGCAAGGCGAAATGCA3660                 CAAGTTGTTTTCTTGAGAGCTATCTCACATACTCCTGGCCTAACTACCTTCTCTGGT3720                 CGCCGAGCTCTTCTCCAGGGGCTTGACGAATTGGAACGTGCTCAAGCAAACTCAAAG3780                 AGTGTCCAGTCATCGTCTCGCATCTACCTTCACTCTCTCCCAGAACAGTCTGATGCA3840                 CCCGAGGAGATTGCTAAAGAATTCGAAGGTGTCATTGACAAGCTAAAGAGTCGATTG3900                 CAACGTCTTACGAAACTGCGTGTGGATGAGATTGAAACCAAGGTTCGCGTGACTGTC3960                 GATGAAGACGGTAGTCCCAGGGTTGTGCCTGTACGCCTTGTGGCTTCTTCAATGCAA4020                 GAATGGCTTAAAACATCTGCTTACATTGATCGTCCGGACCCGGTCACTGGAGTCACC4080                 GAACGGTGCGTGATTGGAGAAGGCATTGACGAGGTTTGTGAACTTGAGTCGTATGAC4140                 ACCAGTACCATCCAAACAAAGCGCTCAATTGCAAGACGTGTGGGATCTACCTACGCT4200                 GACTACCTTGGACTCCTTGAGGTCAGCTTGCTTGGAGAATGGGATAAGTATCTCAGC4260                 CTCTCAGGACCGGACACCCCTACCATCCCGTCGAATGTTTTTGAAGCTCAAGAGTTA4320                 GAAGGACCTGATGGCGAGCTTGTCACCGGGAAACGTGAAATTGGAACAAATAAGGTT4380                 ATGGTTGCATGGGTGGTAACAATGAAAACACCTGAATATCCTGAGGGTCGACAGGTT4440                 GTAATTGTGAACGATGTCACTGTACAAAGTGGTTCATTTGGAGTTGAGGAGGATGAA4500                 TTCTTCAAGGCCTCCAAATATGCTCGCGAAAATAAGCTCCCCCGTGTCTACATTGCG4560                 AACTCTGGTGCTAGAATTGGTTTGGTGGATGATCTCAAGCCAAAGTTCCAGATCAAA4620                 ATTGATGAGGCGAGTCCATCTAAGGGTTTTGAGTACCTTTATCTTGATGATGCAACG4680                 AAATCTCTTCCAGAAGGGTCGGTAAATGTAAGGAAGGTCCCTGAAGGCTGGGCTATC4740                 GATATCATTGGAACGAACGAAGGAATTGGGGTTGAGAACCTTCAAGGAAGTGGCAAA4800                 GCTGGCGAGACATCAAGGGCATATGATGAAATCTTCACCTTGAGTTACGTCACAGGT4860                 AGTGTTGGTATTGGAGCTTACCTTGTCCGTCTCGGCCAGCGTATTATTCAGATGAAA4920                 GGACCCATGATTCTCACAGGCTATGGTGCCCTGAATAAGCTTCTCGGCCGTGAAGTG4980                 AACTCAAACGACCAACTTGGTGGTCCTCAAGTCATGTTCCCAAACGGCTGCTCTCAT5040                 ATTGTAGATGATGACCAACAAGGCATCCAGTCCATTATCCAATGGCTAAGCTTTGTT5100                 AAGACAACTGATGCTGTGTCACCCGTCCGTGAATGTGCCGACCCTGTCAACAGGGAT5160                 CAATGGCGCCCTACCCCCACTCCTTATGATCCACGCCTCATGCTCTCAGGAACTGAC5220                 GAACTCGGTTTTTTTGACACAGGAAGCTGGAAGGAATATCTTGCTGGCTGGGGGAAG5280                 GTTGTTATTGGCCGCGGTCGCCTTGGTGGCATTCCTATGGGTGCTATTGCCGTGGAG5340                 CGGCTTGTTGAGAAGATTATCCCTGCAGATCCAGCAGACCCCAACTCCCGCGAAGCT5400                 ATGCCCCAGGCTGGACAAGTTCTTTTCCCTGACTCATCCTACAAGACAGCCCAAGCT5460                 CGCGACTTTAATAACGAGGGCCTCCCTGTGATGATTTTCGGCAACTGGCGTGGATTT5520                 GGTGGAAGTCGTGACATGTCTGGTGAAATCCTCAAATTTGGATCCATGATTGTCGAT5580                 CTCCGAGAGTACAAACATCCTATTTACATATACTTCCCTCCATATGGTGAACTTCGA5640                 GGATCGTGGGTTGTGGTGGACCCCACTATCAATGAGGACAAGATGACCATGTTCTCA5700                 CCTGATGCTCGTGGTGGTATTCTCGAACCTGCTGGTATTGTAGAAATCAAGTTCCGC5760                 GCAGACCAGCTGAAAGCCATGCACCGCATTGATCCCCAGCTGAAGATGCTAGATTCA5820                 CTTGAGTCGACAGACGACACAGATGTCGCTGCTCAAGAAGCAATCAAAGAGCAGATT5880                 GCAAGAGAGGAGCTTCTTAAACCCGTCTATCTTCAGGCTGCTACTGAATTTGCTGAT5940                 CACGACAAGACGGGACGGATGAAGGCGAAGGGTGTTATCAAAGAAGCAGTTCCATGG6000                 CGCTCTCGTGAATACTTCTTTTATCTTGCTAAGCGCCGCATTTTTCAAGACAACTAT6060                 TTGCAAATCACTGCTGCTGATCCTTCGTTAGACTCTAAGGCTGCTCTTGAGGTGTTG6120                 AACATGTGCACTGCAGACTGGGATGACAACAAAGCCGTTCTTGACTATTATCTGTCC6180                 GATGGAGACATCACAGCCAAGATTAGCGAGATGAAGAAGGCAGCTATCAAGGCACAG6240                 GAGCAGCTTCAGAAAGCTTTGGAGGGTTGA6270                                            __________________________________________________________________________

What is claimed is:
 1. An isolated and purified DNA encoding anacetyl-coenzyme A carboxylase (ACCase) protein from Cyclotella crypticahaving ACCase activity.
 2. The DNA according to claim 1 wherein theamino acid sequence of the encoded protein is:

    __________________________________________________________________________    MALRRGLYAAAATAILVTASVTAFAPQHSTFTPQSLSAAPTRNVFGQIKSAFFNHDVATS                  RTILHAATLDETVLSASDSVAKSVEDYVKSRGGNRVIRKVLIANNGMAATKSILSMRQW                   AYMEFGDERAIQFVAMATPEDLKANAEFIRLADSFVEVPGGKNLNNYANVDVITRIAKE                   QGVDAVWPGWGHASENPKLPNALDKLGIKFIGPTGPVMSVLGDKIAANILAQTAKVPSIP                  WSGSFGGPDDGPLQADLTEEGTIPMEIFNKGLVTSADEAVIVANKIGWENGIMIKASEGG                  GGKGIRFVDNEADLRNAFVQVSNEVIGSPIFLMQLCKNARHIEVQIVGDQHGNAVALNG                   RDCSTQRRFQKIFEEGPPSIVPKETFHEMELAAQRLTQNIGYQGAGTVEYLYNAADNKFF                  FLELNPRLQVEHPVTEGITGANLPATQLQVAMGIPLFNIPDIRRLYGREDAYGTDPIDFLQ                 ERYRELDSHVIAARITAENPDEGFKPTSGSIERIKFQSTPNVWGYFSVGANGGIHEFADSQ                 FGHLFAKGPNREQARKALVLALKEMEVRGDIRNSVEYLVKLLETEAFKKNTIDTSWLDG                   IIKEKSVKVEMPSHLVVVGAAVFKAFEHVKVATEEVKESFRKGQVSTAGIPGINSFNIEVA                 YLDTKYPFHVERISPDVYRFTLDGNTIDVEVTQTAEGALLATFGGETHRIFGMDEPLGLR                  LSLDGATVLMPTIFDPSELRTDVTGKVVRYLQDNGATVEAGQPYVEVEAMKMIMPIKAT                   ESGKITHNLSAGSVISAGDLLASLELKDPSRVKKIETFSGKLDIMESKVDLEPQKAVMNVL                 SGFNLDPEAVAQQAIDSATDSSAAADLLVQVLDEFYRVESQFDGVIADDVVRTLTKANT                   ETLDVVISENLAHQQLKRRSQLLLAMIRQLDTFQDRFGREVPDAVIEALSRLSTLKDKSY                  GEIILAAEERVREAKVPSFEVRRADLRAKLADPETDLIDLSKSSTLSAGVDLLTNLFDDED                 ESVRAAAMEVYTRRVYRTYNIPELTVGVENGRLSCSFSFQFADVPAKDRVTRQGFFSVID                  DASKFAQQLPEILNSFGSKIAGDASKEGPVNVLQVGALSGDISIEDLEKATSANKDKLNM                  LGVRTVTALIPRGKKDPSYYSFPQCSGFKEDPLRRGMRPTFHHLLELGRLEENFALERIPA                 VGRNVQIYVGSEKTARRNAAQVVFLRAISHTPGLTTFSGARRALLQGLDELERAQANSK                   VSVQSSSRIYLHSLPEQSDATPEEIAKEFEGVIDKLKSRLAQRLTKLRVDEIETKVRVTVQ                 DEDGSPRVVPVRLVASSMQGEWLKTSAYIDRPDPVTGVTRERCVIGEGIDEVCELESYDS                  TSTIQTKRSIARRVGSTYAYDYLGLLEVSLLGEWDKYLSSLSGPDTPTIPSNVFEAQELLE                 GPDGELVTGKREIGTNKVGMVAWVVTMKTPEYPEGRQVVVIVNDVTVQSGSFGVEED                     EVFFKASKYARENKLPRVYIACNSGARIGLVDDLKPKFQIKFIDEASPSKGFEYLYLDDAT                 YKSLPEGSVNVRKVPEGWAITDIIGTNEGIGVENLQGSGKIAGETSRAYDEIFTLSYVTGR                 SVGIGAYLVRLGQRIIQMKQGPMILTGYGALNKLLGREVYNSNDQLGGPQVMFPNGCSH                   EIVDDDQQGIQSIIQWLSFVPKTTDAVSPVRECADPVNRDVQWRPTPTPYDPRLMLSGTD                  EELGFFDTGSWKEYLAGWGKSVVIGRGRLGGIPMGAIAVETRLVEKIIPADPADPNSREA                  VMPQAGQVLFPDSSYKTAQALRDFNNEGLPVMIFANWRGFSGGSRDMSGEILKFGSMIV                   DSLREYKHPIYIYFPPYGELRGGSWVVVDPTINEDKMTMFSDPDARGGILEPAGIVEIKFR                 LADQLKAMHRIDPQLKMLDSELESTDDTDVAAQEAIKEQIAAREELLKPVYLQAATEFA                   DLHDKTGRMKAKGVIKEAVPWARSREYFFYLAKRRIFQDNYVLQITAADPSLDSKAALE                   VLKNMCTADWDDNKAVLDYYLSSDGDITAKISEMKKAAIKAQIEQLQKALEG (SEQ ID                  __________________________________________________________________________    NO:23).                                                                        .


3. A vector containing the DNA of claim
 1. 4. A vector containing theDNA of claim
 2. 5. A host cell containing the vector of claim
 3. 6. Ahost cell containing the vector of claim
 4. 7. The host cell of claim 6,wherein said host is Cyclotella cryptica.
 8. The DNA according to claim2 wherein the DNA sequence is:

    __________________________________________________________________________    ATGGCTCTCCGTAGGGGCCTTTACGCTGCTGCAGCGACTGCCATCTTGGTCACGGCTT                    CAGTGACCGCTTTTGGTAAGTCTGCATTTGGATTGATGGTTAGCATTCCCCACGAGCA                    GCATGTTGTGTTACGCGTTGTTGCGTAGTGTCAGTTGTGATAATTATGATCGACAAGA                    ATGGGAGGACTCTTTTTGTATCGTTTGTAGAGTGTTACACTGGACCTTCGCCTAAACA                    CGTTTGGAGGTCCTCACATCCGCGACGAGAGCTCCCACATTTCATCTACATCTCTACG                    TGAGCGAATTTACGTCACCTGGCTATTCATTTGAGGTCCCTTCCTCCCACGTGCTTCC                    ATGTTCCTTAGGGCGCTTAAGCATAGTTGCACTTGGAGCACTTGTTGTCAAATTGTCG                    TGTACCCGTCACTTTCGAAGCGTTATTTGGGGTTGGCTGGTCCTATTTAAACAGAAAT                    TATTACGATGTTTCGCTAACGATTCTTTCTCTCATTTTTTAACCTACACGAAACAGCTC                   CTCAGCATTCGACATTCACCCCCCAATCGCTCTCGGCGGCACCCACGCGCAACGTCTT                    CGGCCAGATCAAAAGCGCCTTCTTCAACCATGATGTTGCCACCTCTCGAACCATTCTT                    CACGCCGCGACACTAGATGAAACTGTTCTTTCCGCTTCAGACTCCGTCGCCAAATCTG                    TCGAAGACTACGTGAAATCCCGTGGTGGAAATCGCGTCATTCGTAAAGTCCTCATCG                     CCAACAACGGCATGGCCGCGACAAAGTCCATCCTCTCCATGCGTCAATGGGCCTACA                     TGGAATTCGGGGACGAACGTGCCATCCAGTTCGTTGCGATGGCGACTCCCGAGGATT                     TGAAGGCGAACGCCGAATTTATTCGCTTGGCGGATTCTTTCGTCGAGGTACCGGGAG                     GAAAGAACTTGAACAACTACGCCAACGTCGATGTCATTACCCGCATCGCTAAGGAGC                     AGGGGGTTGATGCCGTTTGGCCTGGATGGGGTCATGCATCTGAGAATCCGAAGCTCC                     CTAATGCGCTTGACAAATTGGGAATCAAGTTCATTGGACCAACTGGGCCTGTCATGA                     GCGTTTTGGGAGACAAGATTGCTGCGAACATTCTAGCACAGACAGCGAAAGTCCCCT                     CCATTCCCTGGAGTGGATCCTTTGGTGGACCAGACGATGGACCCCTTCAGGCGGATC                     TGACCGAGGAGGGTACTATCCCAATGGAAATCTTTAACAAGGGATTAGTAACCTCTG                     CTGATGAAGCCGTCATTGTGGCGAACAAGATTGGCTGGGAGAACGGAATCATGATCA                     AGGCTTCTGAGGGTGGAGGAGGAAAGGGTATACGCTTTGTCGACAATGAGGCCGAC                      TTACGGAACGCGTTCGTTCAGGTGTCCAATGAAGTGATTGGCTCTCCTATTTTCCTCA                    TGCAGTTGTGTAAGAACGCTCGTCACATCGAAGTGCAAATTGTTGGCGACCAGCACG                     GAAATGCTGTAGCGTTGAACGGTCGAGATTGCTCCACTCAGCGTCGCTTCCAGAAGA                     TCTTCGAGGAAGGTCCTCCGTCCATTGTACCGAAAGAAACATTCCACGAGATGGAAC                     TTGCGGCTCAACGGTTGACTCAAAACATTGGGTATCAAGGTGCTGGAACTGTGGAAT                     ACTTGTACAACGCCGCTGACAATAAGTTTTTCTTCCTTGAGTTGAACCCCCGTCTCCA                    AGTGGAGCATCCTGTGACTGAAGGAATTACCGGCGCTAATCTTCCTGCCACTCAGCT                     TCAAGTTGCTATGGGTATTCCTCTCTTCAACATTCCTGACATTCGCCGTCTCTATGGA                    AGAGAGGATGCTTACGGAACGGATCCCATTGATTTTCTTCAAGAACGTTACCGCGAA                     CTCGACTCTCATGTAATTGCTGCCCGCATCACTGCTGAAAACCCCGATGAAGGATTCA                    AACCCACCTCAGGCTCAATTGAGCGAATCAAATTTCAATCCACCCCAAATGTTTGGG                     GATATTTCTCTGTTGGTGCTAACGGTGGAATCCATGAATTTGCCGACTCTCAGTTTGG                    CCATCTTTTCGCTAAGGGTCCGAACCGTGAGCAAGCCCGCAAGGCATTGGTTTTGGC                     TCTTAAGGAGATGGAAGTGCGCGGAGACATTCGTAACTCTGTTGAATACCTAGTCAA                     GTTGCTCGAAACTGAAGCTTTCAAGAAGAACACTATCGACACGTCTTGGTTAGATGG                     CATTATTAAGGAGAAGTCCGTTAAAGTTGAGATGCCCTCTCACTTAGTGGTTGTCGG                     AGCCGCTGTTTTCAAGGCCTTCGAACATGTTAAGGTGGCCACTGAAGAAGTTAAGGA                     ATCGTTTCGAAAAGGACAAGTCTCCACTGCAGGGATTCCAGGCATAAACTCGTTCAA                     CATCGAAGTTGCGTACTTAGACACGAAGTACCCATTCCACGTAGAACGGATCTCTCC                     AGATGTTTACAGGTTTACCTTGGACGGGAACACGATTGATGTGGAAGTTACCCAAAC                     CGCTGAAGGAGCACTTTTGGCAACCTTTGGAGGAGAGACTCATCGTATCTTTGGTAT                     GGACGAACCACTTGGCCTTCGACTGTCATTGGACGGGGCAACTGTCCTAATGTAAGT                     TGTCTGTCCCTCGATGTCGCTGTTTCATCTGTAGTCAAGTATCCTCACCTTATGTACTT                   ATTCGTAGGCCAACAATTTTTGACCCCTCTGAACTCCGCACTGATGTGACTGGAAAG                     GTTGTTCGTTACCTCCAAGACAATGGAGCAACTGTTGAAGCGGGCCAGCCCTATGTC                     GAGGTTGAAGCGATGAAGATGATCATGCCAATCAAGGCTACTGAGTCTGGAAAAATT                     ACTCACAACCTAAGTGCTGGATCTGTAATCTCTGCTGGTGACCTTCTTGCTTCTCTCG                    AACTTAAGGATCCCTCTAGGGTTAAGAAAATAGAAACTTTTTCGGGCAAATTGGACA                     TTATGGAATCGAAGGTTGACTTAGAACCGCAGAAAGCAGTCATGAATGTCCTCTCTG                     GGTTCAACTTAGACCCTGAGGCAGTTGCGCAGCAAGCAATTGACAGTGCTACCGACA                     GCTCTGCCGCAGCCGATCTTCTTGTCCAAGTATTAGACGAATTCTATCGCGTTGAATC                    TCAGTTTGATGGTGTCATCGCTGATGATGTTGTCCGCACTCTCACCAAAGCGAACACC                    GAGACACTTGATGTTGTCATCTCCGAGAACTTGGCCCACCAGCAGCTCAAGAGGCGT                     AGTCAGCTTCTCCTCGCTATGATCCGTCAACTTGACACGTTTCAAGACAGATTTGGCA                    GAGAAGTTCCGGATGCTGTCATTGAAGCATTGAGTAGGCTTTCTACCTTGAAAGACA                     AATCTTACGGTGAAATCATTCTTGCGGCTGAGGAGAGAGTCCGCGAAGCCAAGGTGC                     CGTCCTTCGAAGTGCGTCGTGCTGATTTGCGTGCAAAGCTTGCTGACCCGGAGACAG                     ATTTGATTGACCTGAGTAAGAGCTCAACACTCTCAGCAGGGGTTGACCTTCTCACAA                     ATCTTTTTGATGACGAAGATGAATCTGTCCGCGCTGCTGCTATGGAAGTATATACTCG                    CCGTGTCTACCGTACCTACAACATCCCCGAGCTAACTGTTGGAGTTGAGAATGGCCG                     CCTCTCATGTAGCTTCTCCTTCCAATTTGCTGATGTCCCGGCGAAAGACCGTGTCACC                    CGCCAAGGGTTCTTCTCAGTTATCGACGACGCTTCAAAGTTCGCGCAACAGCTTCCTG                    AGATTCTCAACTCGTTTGGATCAAAGATCGCAGGGGATGCAAGCAAAGAAGGCCCTG                     TCAATGTTTTGCAGGTTGGTGCTCTCTCGGGAGATATCAGTATTGAGGACCTCGAGA                     AAGCTACTTCCGCTAACAAGGACAAGTTGAATATGCTTGGTGTCCGCACTGTGACGG                     CTCTTATCCCAAGGGGAAAGAAGGACCCAAGCTATTATTCATTCCCCCAATGCAGTG                     GCTTCAAGGAGGATCCTCTTCGCAGAGGCATGCGCCCAACCTTTCATCATCTCCTGGA                    ACTCGGACGGCTGGAGGAAAACTTTGCTCTTGAACGAATTCCTGCAGTTGGACGCAA                     CGTACAGATTTATGTTGGTTCCGAGAAGACGGCAAGGCGAAATGCAGCTCAAGTTGT                     TTTCTTGAGAGCTATCTCACATACTCCTGGCCTAACTACCTTCTCTGGTGCACGCCGA                    GCTCTTCTCCAGGGGCTTGACGAATTGGAACGTGCTCAAGCAAACTCAAAGGTCAGT                     GTCCAGTCATCGTCTCGCATCTACCTTCACTCTCTCCCAGAACAGTCTGATGCAACTC                    CCGAGGAGATTGCTAAAGAATTCGAAGGTGTCATTGACAAGCTAAAGAGTCGATTGG                     CCCAACGTCTTACGAAACTGCGTGTGGATGAGATTGAAACCAAGGTTCGCGTGACTG                     TCCAGGATGAAGACGGTAGTCCCAGGGTTGTGCCTGTACGCCTTGTGGCTTCTTCAA                     TGCAAGGCGAATGGCTTAAAACATCTGCTTACATTGATCGTCCGGACCCGGTCACTG                     GAGTCACCCGTGAACGGTGCGTGATTGGAGAAGGCATTGACGAGGTTTGTGAACTTG                     AGTCGTATGACTCTACCAGTACCATCCAAACAAAGCGCTCAATTGCAAGACGTGTGG                     GATCTACCTACGCTTATGACTACCTTGGACTCCTTGAGGTCAGCTTGCTTGGAGAATG                    GGATAAGTATCTCAGCAGTCTCTCAGGACCGGACACCCCTACCATCCCGTCGAATGT                     TTTTGAAGCTCAAGAGTTACTTGAAGGACCTGATGGCGAGCTTGTCACCGGGAAACG                     TGAAATTGGAACAAATAAGGTTGGTATGGTTGCATGGGTGGTAACAATGAAAACACC                     TGAATATCCTGAGGGTCGACAGGTTGTTGTAATTGTGAACGATGTCACTGTACAAAG                     TGGTTCATTTGGAGTTGAGGAGGATGAAGTTTTCTTCAAGGCCTCCAAATATGCTCGC                    GAAAATAAGCTCCCCCGTGTCTACATTGCGTGCAACTCTGGTGCTAGAATTGGTTTG                     GTGGATGATCTCAAGCCAAAGTTCCAGATCAAATTCATTGATGAGGCGAGTCCATCT                     AAGGGTTTTGAGTACCTTTATCTTGATGATGCAACGTACAAATCTCTTCCAGAAGGGT                    CGGTAAATGTAAGGAAGGTCCCTGAAGGCTGGGCTATCACTGATATCATTGGAACGA                     ACGAAGGAATTGGGGTTGAGAACCTTCAAGGAAGTGGCAAAATTGCTGGCGAGACA                      TCAAGGGCATATGATGAAATCTTCACCTTGAGTTACGTCACAGGTAGAAGTGTTGGT                     ATTGGAGCTTACCTTGTCCGTCTCGGCCAGCGTATTATTCAGATGAAACAAGGACCC                     ATGATTCTCACAGGCTATGGTGCCCTGAATAAGCTTCTCGGCCGTGAAGTGTACAAC                     TCAAACGACCAACTTGGTGGTCCTCAAGTCATGTTCCCAAACGGCTGCTCTCATGAA                     ATTGTAGATGATGACCAACAAGGCATCCAGTCCATTATCCAATGGCTAAGCTTTGTTC                    CCAAGACAACTGATGCTGTGTCACCCGTCCGTGAATGTGCCGACCCTGTCAACAGGG                     ATGTTCAATGGCGCCCTACCCCCACTCCTTATGATCCACGCCTCATGCTCTCAGGAAC                    TGACGAGGAACTCGGTTTTTTTGACACAGGAAGCTGGAAGGAATATCTTGCTGGCTG                     GGGGAAGAGTGTTGTTATTGGCCGCGGTCGCCTTGGTGGCATTCCTATGGGTGCTAT                     TGCCGTGGAGACCCGGCTTGTTGAGAAGATTATCCCTGCAGATCCAGCAGACCCCAA                     CTCCCGCGAAGCTGTCATGCCCCAGGCTGGACAAGTTCTTTTCCCTGACTCATCCTAC                    AAGACAGCCCAAGCTCTCCGCGACTTTAATAACGAGGGCCTCCCTGTGATGATTTTC                     GGCAACTGGCGTGGATTTAGTGGTGGAAGTCGTGACATGTCTGGTGAAATCCTCAAA                     TTTGGATCCATGATTGTCGATTCACTCCGAGAGTACAAACATCCTATTTACATATACT                    TCCCTCCATATGGTGAACTTCGAGGAGGATCGTGGGTTGTGGTGGACCCCACTATCA                     ATGAGGACAAGATGACCATGTTCTCAGATCCTGATGCTCGTGGTGGTATTCTCGAAC                     CTGCTGGTATTGTAGAAATCAAGTTCCGCTTGGCAGACCAGCTGAAAGCCATGCACC                     GCATTGATCCCCAGCTGAAGATGCTAGATTCAGAGCTTGAGTCGACAGACGACACAG                     ATGTCGCTGCTCAAGAAGCAATCAAAGAGCAGATTGCTGCAAGAGAGGAGCTTCTTA                     AACCCGTCTATCTTCAGGCTGCTACTGAATTTGCTGATCTCCACGACAAGACGGGAC                     GGATGAAGGCGAAGGGTGTTATCAAAGAAGCAGTTCCATGGGCTCGCTCTCGTGAAT                     ACTTCTTTTATCTTGCTAAGCGCCGCATTTTTCAAGACAACTATGTGTTGCAAATCAC                    TGCTGCTGATCCTTCGTTAGACTCTAAGGCTGCTCTTGAGGTGTTGAAGAACATGTGC                    ACTGCAGACTGGGATGACAACAAAGCCGTTCTTGACTATTATCTGTCCAGCGATGGA                     GACATCACAGCCAAGATTAGCGAGATGAAGAAGGCAGCTATCAAGGCACAGATCGA                      GCAGCTTCAGAAAGCTTTGGAGGGTTGA (SEQ ID NO:22).                                  __________________________________________________________________________     .


9. The DNA of claim 2 having the sequence:

    __________________________________________________________________________    ATGGCTCTCCGTAGGGGCCTTTACGCTGCTGCAGCGACTGCCATCTTGGTCACGGCTT                    CAGTGACCGCTTTTGCTCCTCAGCATTCGACATTCACCCCCCAATCGCTCTCGGCGGC                    ACCCACGCGCAACGTCTTCGGCCAGATCAAAAGCGCCTTCTTCAACCATGATGTTGC                     CACCTCTCGAACCATTCTTCACGCCGCGACACTAGATGAAACTGTTCTTTCCGCTTCA                    GACTCCGTCGCCAAATCTGTCGAAGACTACGTGAAATCCCGTGGTGGAAATCGCGTC                     ATTCGTAAAGTCCTCATCGCCAACAACGGCATGGCCGCGACAAAGTCCATCCTCTCC                     ATGCGTCAATGGGCCTACATGGAATTCGGGGACGAACGTGCCATCCAGTTCGTTGCG                     ATGGCGACTCCCGAGGATTTGAAGGCGAACGCCGAATTTATTCGCTTGGCGGATTCT                     TTCGTCGAGGTACCGGGAGGAAAGAACTTGAACAACTACGCCAACGTCGATGTCATT                     ACCCGCATCGCTAAGGAGCAGGGGGTTGATGCCGTTTGGCCTGGATGGGGTCATGCA                     TCTGAGAATCCGAAGCTCCCTAATGCGCTTGACAAATTGGGAATCAAGTTCATTGGA                     CCAACTGGGCCTGTCATGAGCGTTTTGGGAGACAAGATTGCTGCGAACATTCTAGCA                     CAGACAGCGAAAGTCCCCTCCATTCCCTGGAGTGGATCCTTTGGTGGACCAGACGAT                     GGACCCCTTCAGGCGGATCTGACCGAGGAGGGTACTATCCCAATGGAAATCTTTAAC                     AAGGGATTAGTAACCTCTGCTGATGAAGCCGTCATTGTGGCGAACAAGATTGGCTGG                     GAGAACGGAATCATGATCAAGGCTTCTGAGGGTGGAGGAGGAAAGGGTATACGCTT                      TGTCGACAATGAGGCCGACTTACGGAACGCGTTCGTTCAGGTGTCCAATGAAGTGAT                     TGGCTCTCCTATTTTCCTCATGCAGTTGTGTAAGAACGCTCGTCACATCGAAGTGCAA                    ATTGTTGGCGACCAGCACGGAAATGCTGTAGCGTTGAACGGTCGAGATTGCTCCACT                     CAGCGTCGCTTCCAGAAGATCTTCGAGGAAGGTCCTCCGTCCATTGTACCGAAAGAA                     ACATTCCACGAGATGGAACTTGCGGCTCAACGGTTGACTCAAAACATTGGGTATCAA                     GGTGCTGGAACTGTGGAATACTTGTACAACGCCGCTGACAATAAGTTTTTCTTCCTTG                    AGTTGAACCCCCGTCTCCAAGTGGAGCATCCTGTGACTGAAGGAATTACCGGCGCTA                     ATCTTCCTGCCACTCAGCTTCAAGTTGCTATGGGTATTCCTCTCTTCAACATTCCTGA                    CATTCGCCGTCTCTATGGAAGAGAGGATGCTTACGGAACGGATCCCATTGATTTTCTT                    CAAGAACGTTACCGCGAACTCGACTCTCATGTAATTGCTGCCCGCATCACTGCTGAA                     AACCCCGATGAAGGATTCAAACCCACCTCAGGCTCAATTGAGCGAATCAAATTTCAA                     TCCACCCCAAATGTTTGGGGATATTTCTCTGTTGGTGCTAACGGTGGAATCCATGAAT                    TTGCCGACTCTCAGTTTGGCCATCTTTTCGCTAAGGGTCCGAACCGTGAGCAAGCCCG                    CAAGGCATTGGTTTTGGCTCTTAAGGAGATGGAAGTGCGCGGAGACATTCGTAACTC                     TGTTGAATACCTAGTCAAGTTGCTCGAAACTGAAGCTTTCAAGAAGAACACTATCGA                     CACGTCTTGGTTAGATGGCATTATTAAGGAGAAGTCCGTTAAAGTTGAGATGCCCTC                     TCACTTAGTGGTTGTCGGAGCCGCTGTTTTCAAGGCCTTCGAACATGTTAAGGTGGCC                    ACTGAAGAAGTTAAGGAATCGTTTCGAAAAGGACAAGTCTCCACTGCAGGGATTCCA                     GGCATAAACTCGTTCAACATCGAAGTTGCGTACTTAGACACGAAGTACCCATTCCAC                     GTAGAACGGATCTCTCCAGATGTTTACAGGTTTACCTTGGACGGGAACACGATTGAT                     GTGGAAGTTACCCAAACCGCTGAAGGAGCACTTTTGGCAACCTTTGGAGGAGAGACT                     CATCGTATCTTTGGTATGGACGAACCACTTGGCCTTCGACTGTCATTGGACGGGGCA                     ACTGTCCTAATGCCAACAATTTTTGACCCCTCTGAACTCCGCACTGATGTGACTGGAA                    AGGTTGTTCGTTACCTCCAAGACAATGGAGCAACTGTTGAAGCGGGCCAGCCCTATG                     TCGAGGTTGAAGCGATGAAGATGATCATGCCAATCAAGGCTACTGAGTCTGGAAAAA                     TTACTCACAACCTAAGTGCTGGATCTGTAATCTCTGCTGGTGACCTTCTTGCTTCTCT                    CGAACTTAAGGATCCCTCTAGGGTTAAGAAAATAGAAACTTTTTCGGGCAAATTGGA                     CATTATGGAATCGAAGGTTGACTTAGAACCGCAGAAAGCAGTCATGAATGTCCTCTC                     TGGGTTCAACTTAGACCCTGAGGCAGTTGCGCAGCAAGCAATTGACAGTGCTACCGA                     CAGCTCTGCCGCAGCCGATCTTCTTGTCCAAGTATTAGACGAATTCTATCGCGTTGAA                    TCTCAGTTTGATGGTGTCATCGCTGATGATGTTGTCCGCACTCTCACCAAAGCGAACA                    CCGAGACACTTGATGTTGTCATCTCCGAGAACTTGGCCCACCAGCAGCTCAAGAGGC                     GTAGTCAGCTTCTCCTCGCTATGATCCGTCAACTTGACACGTTTCAAGACAGATTTGG                    CAGAGAAGTTCCGGATGCTGTCATTGAAGCATTGAGTAGGCTTTCTACCTTGAAAGA                     CAAATCTTACGGTGAAATCATTCTTGCGGCTGAGGAGAGAGTCCGCGAAGCCAAGGT                     GCCGTCCTTCGAAGTGCGTCGTGCTGATTTGCGTGCAAAGCTTGCTGACCCGGAGAC                     AGATTTGATTGACCTGAGTAAGAGCTCAACACTCTCAGCAGGGGTTGACCTTCTCAC                     AAATCTTTTTGATGACGAAGATGAATCTGTCCGCGCTGCTGCTATGGAAGTATATACT                    CGCCGTGTCTACCGTACCTACAACATCCCCGAGCTAACTGTTGGAGTTGAGAATGGC                     CGCCTCTCATGTAGCTTCTCCTTCCAATTTGCTGATGTCCCGGCGAAAGACCGTGTCA                    CCCGCCAAGGGTTCTTCTCAGTTATCGACGACGCTTCAAAGTTCGCGCAACAGCTTCC                    TGAGATTCTCAACTCGTTTGGATCAAAGATCGCAGGGGATGCAAGCAAAGAAGGCCC                     TGTCAATGTTTTGCAGGTTGGTGCTCTCTCGGGAGATATCAGTATTGAGGACCTCGA                     GAAAGCTACTTCCGCTAACAAGGACAAGTTGAATATGCTTGGTGTCCGCACTGTGAC                     GGCTCTTATCCCAAGGGGAAAGAAGGACCCAAGCTATTATTCATTCCCCCAATGCAG                     TGGCTTCAAGGAGGATCCTCTTCGCAGAGGCATGCGCCCAACCTTTCATCATCTCCTG                    GAACTCGGACGGCTGGAGGAAAACTTTGCTCTTGAACGAATTCCTGCAGTTGGACGC                     AACGTACAGATTTATGTTGGTTCCGAGAAGACGGCAAGGCGAAATGCAGCTCAAGTT                     GTTTTCTTGAGAGCTATCTCACATACTCCTGGCCTAACTACCTTCTCTGGTGCACGCC                    GAGCTCTTCTCCAGGGGCTTGACGAATTGGAACGTGCTCAAGCAAACTCAAAGGTCA                     GTGTCCAGTCATCGTCTCGCATCTACCTTCACTCTCTCCCAGAACAGTCTGATGCAAC                    TCCCGAGGAGATTGCTAAAGAATTCGAAGGTGTCATTGACAAGCTAAAGAGTCGATT                     GGCCCAACGTCTTACGAAACTGCGTGTGGATGAGATTGAAACCAAGGTTCGCGTGAC                     TGTCCAGGATGAAGACGGTAGTCCCAGGGTTGTGCCTGTACGCCTTGTGGCTTCTTC                     AATGCAAGGCGAATGGCTTAAAACATCTGCTTACATTGATCGTCCGGACCCGGTCAC                     TGGAGTCACCCGTGAACGGTGCGTGATTGGAGAAGGCATTGACGAGGTTTGTGAACT                     TGAGTCGTATGACTCTACCAGTACCATCCAAACAAAGCGCTCAATTGCAAGACGTGT                     GGGATCTACCTACGCTTATGACTACCTTGGACTCCTTGAGGTCAGCTTGCTTGGAGAA                    TGGGATAAGTATCTCAGCAGTCTCTCAGGACCGGACACCCCTACCATCCCGTCGAAT                     GTTTTTGAAGCTCAAGAGTTACTTGAAGGACCTGATGGCGAGCTTGTCACCGGGAAA                     CGTGAAATTGGAACAAATAAGGTTGGTATGGTTGCATGGGTGGTAACAATGAAAACA                     CCTGAATATCCTGAGGGTCGACAGGTTGTTGTAATTGTGAACGATGTCACTGTACAA                     AGTGGTTCATTTGGAGTTGAGGAGGATGAAGTTTTCTTCAAGGCCTCCAAATATGCT                     CGCGAAAATAAGCTCCCCCGTGTCTACATTGCGTGCAACTCTGGTGCTAGAATTGGTT                    TGGTGGATGATCTCAAGCCAAAGTTCCAGATCAAATTCATTGATGAGGCGAGTCCAT                     CTAAGGGTTTTGAGTACCTTTATCTTGATGATGCAACGTACAAATCTCTTCCAGAAGG                    GTCGGTAAATGTAAGGAAGGTCCCTGAAGGCTGGGCTATCACTGATATCATTGGAAC                     GAACGAAGGAATTGGGGTTGAGAACCTTCAAGGAAGTGGCAAAATTGCTGGCGAGA                      CATCAAGGGCATATGATGAAATCTTCACCTTGAGTTACGTCACAGGTAGAAGTGTTG                     GTATTGGAGCTTACCTTGTCCGTCTCGGCCAGCGTATTATTCAGATGAAACAAGGAC                     CCATGATTCTCACAGGCTATGGTGCCCTGAATAAGCTTCTCGGCCGTGAAGTGTACA                     ACTCAAACGACCAACTTGGTGGTCCTCAAGTCATGTTCCCAAACGGCTGCTCTCATGA                    AATTGTAGATGATGACCAACAAGGCATCCAGTCCATTATCCAATGGCTAAGCTTTGTT                    CCCAAGACAACTGATGCTGTGTCACCCGTCCGTGAATGTGCCGACCCTGTCAACAGG                     GATGTTCAATGGCGCCCTACCCCCACTCCTTATGATCCACGCCTCATGCTCTCAGGAA                    CTGACGAGGAACTCGGTTTTTTTGACACAGGAAGCTGGAAGGAATATCTTGCTGGCT                     GGGGGAAGAGTGTTGTTATTGGCCGCGGTCGCCTTGGTGGCATTCCTATGGGTGCTA                     TTGCCGTGGAGACCCGGCTTGTTGAGAAGATTATCCCTGCAGATCCAGCAGACCCCA                     ACTCCCGCGAAGCTGTCATGCCCCAGGCTGGACAAGTTCTTTTCCCTGACTCATCCTA                    CAAGACAGCCCAAGCTCTCCGCGACTTTAATAACGAGGGCCTCCCTGTGATGATTTTC                    GGCAACTGGCGTGGATTTAGTGGTGGAAGTCGTGACATGTCTGGTGAAATCCTCAAA                     TTTGGATCCATGATTGTCGATTCACTCCGAGAGTACAAACATCCTATTTACATATACT                    TCCCTCCATATGGTGAACTTCGAGGAGGATCGTGGGTTGTGGTGGACCCCACTATCA                     ATGAGGACAAGATGACCATGTTCTCAGATCCTGATGCTCGTGGTGGTATTCTCGAAC                     CTGCTGGTATTGTAGAAATCAAGTTCCGCTTGGCAGACCAGCTGAAAGCCATGCACC                     GCATTGATCCCCAGCTGAAGATGCTAGATTCAGAGCTTGAGTCGACAGACGACACAG                     ATGTCGCTGCTCAAGAAGCAATCAAAGAGCAGATTGCTGCAAGAGAGGAGCTTCTTA                     AACCCGTCTATCTTCAGGCTGCTACTGAATTTGCTGATCTCCACGACAAGACGGGAC                     GGATGAAGGCGAAGGGTGTTATCAAAGAAGCAGTTCCATGGGCTCGCTCTCGTGAAT                     ACTTCTTTTATCTTGCTAAGCGCCGCATTTTTCAAGACAACTATGTGTTGCAAATCAC                    TGCTGCTGATCCTTCGTTAGACTCTAAGGCTGCTCTTGAGGTGTTGAAGAACATGTGC                    ACTGCAGACTGGGATGACAACAAAGCCGTTCTTGACTATTATCTGTCCAGCGATGGA                     GACATCACAGCCAAGATTAGCGAGATGAAGAAGGCAGCTATCAAGGCACAGATCGA                      GCAGCTTCAGAAAGCTTTGGAGGGTTGA (SEQ ID NO:25).                                  __________________________________________________________________________     .